RPCS3 / rpcs3

PS3 emulator/debugger
https://rpcs3.net/
GNU General Public License v2.0
15.17k stars 1.89k forks source link

Vulkan: multiple issues on newer AMD (GCN3+) cards #2201

Closed Nezarn closed 6 years ago

Nezarn commented 7 years ago

I just make this issue so it will be easier to track when things gets fixed driver or rpcs3 wise. (its easier to have these issues in one place, since if i post it in some game's issue where its happening, it gets buried if theres a lot of comments :P)

Current issues: - BSOD\Driver crash in certain games (100% reproducable in Project Diva F, reported to AMD) - Always 100% GPU usage (http://i.imgur.com/Kg0hxul.png vs. http://i.imgur.com/wxEM8bU.png) - Unique graphical issue(s) (http://i.imgur.com/a8LTaz0.png (look at the bottom of the pic))

If this issue isn't needed then feel free to close\delete.

kd-11 commented 7 years ago

I have some idea why there are issues and will update this ticket with testing information when time allows.

kd-11 commented 7 years ago

https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.342

Our pipeline barriers are crap, but there's alot of confusion about how the pairings are supposed to work. However, using TOP_OF_PIPE/BOTTOM_OF_PIPE to flush writes just makes no sense. I'll review this part of the spec when I have time and come up with a better solution rather than something off the top of my head.

This fixes alot of flickering visuals on AMD R9 200 series as well, so Its a step in the right direction I hope.

Nezarn commented 7 years ago

@kd-11 Just tried this build, didn't fix anything yet for me sadly. Also if you don't know something, why don't you just ask for advice on AMD forums? Devs could help.

kd-11 commented 7 years ago

The confusion includes members from GPU vendor themselves as well as khronos. See https://github.com/KhronosGroup/Vulkan-Docs/issues/128

After going through the spec for a few minutes, I believe I understand what the spec implies, but even from the thread above, you can see that examples given are often wrong.

BTW If you are still crashing, something else is probably wrong. The remaining visual corruptions should (in theory) be fixed though by that build. On my 200 series, I was getting flickering textures and green color in some games that is gone. The crash may have nothing to do with synchronization in that case. I've been experiencing crashes on my GPU replaying vulkan renderdoc and that raises a red flag since inspecting the code shows nothing suspicious, except that we always crash during a vkCmdPipelineBarrier call.

kd-11 commented 7 years ago

If the graphical issue is unchanged, It might be because the stages in the barrier do not account for changes to and from LAYOUT_GENERAL which we use during buffer clears. I'll update and we can try again when I find the time.

kd-11 commented 7 years ago

I also realized that I failed to implement part of the spec dealing with presentable images so its likely my fault here.

Nezarn commented 7 years ago

@kd-11 its a little bit funny when the devs themselfs doesn't know how to use their stuff XD

Also yes, the driver crash is still there (well i tried it once, crashed at the same place, but at least this time it wasn't a BSOD (BSOD is a bit random, like 40% chance for BSOD, 60 for driver crash)) and the GPU usage and graphical issue is the same. (also the games that are affected by the graphical issue has a "flash" when the vulkan window opens) http://i.imgur.com/5AlNU76.png https://www.youtube.com/watch?v=lBUKw9_uKlY

kd-11 commented 7 years ago

Looks like a collision on presentable images in your case. Try https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.354

If that does not work I'll have to write a barrier manually for the transition just before present. This approach fixed corrupted overlays before. By the way, try enabling the debug overlay and see if it helps. It is known that AMD GPUs rasterize from top left to bottom right (you can find the research online) so its no surprise that corruption happens just when the frame is about to finish copying the image and artefacts become visible at the bottom.

Nezarn commented 7 years ago

@kd-11 yep it still happens, here is the log with debug output (there are a lot of warnings) https://www.dropbox.com/s/epxqxvm1km6540b/vulkanlog.zip?dl=0

kd-11 commented 7 years ago

What if you enable the debug overlay? It adds another stall before presenting.

Nezarn commented 7 years ago

@kd-11 its the same http://i.imgur.com/5Jl7VGi.png

Nezarn commented 7 years ago

@kd-11 AMD replied on the forum https://community.amd.com/thread/206464

So basically they can't do anything, since he would need to own the game. (and he says he can only look into it if no third party download is needed....)

kd-11 commented 7 years ago

That sucks, but its kinda expected. Until rpcs3 can replay renderer state, this will be a difficult one to debug. However, the artifacting issue is visible even in renderdoc and they should be able to help with that one. GPUPerf studio also supports API tracing just no visual output so they can help with that at the very least.

Nezarn commented 7 years ago

@kd-11 yep, oh well, i hope they can at least help with the graphics issue and with the 100% GPU usage... Just posted renderdoc + video on their forum

Nezarn commented 7 years ago

@kd-11 so AMD guy responded:

It appears that the glitches are introduced in colour passes 7-11, where the app appears to be doing a blur. The biggest problem that's jarring from the trace you provided is that your application is executing a lot of renderpasses without defining external dependencies. This can lead to corruptions as the ones we're seeing because of the fact the GPU is free to run the commands in an overlapping manner which may lead to RAW hazards. For performance reasons, also please consider coalescing the huge number of renderpasses your application is using right now, so that the draw calls are embedded in subpasses.

kd-11 commented 7 years ago

The subpasses cannot be done since we have no way of knowing beforehand how the calls will be submitted to the rsx, and the RAW hazard is a known issue, which is why I've been working on the memory barriers. Subpass dependancies are memory barrier type ops so at least we were on the right track. There is an external dependancy that we added, flushing previous color output before the current color output stage, but I guess we can add one to block memory read on fragment shader as well.

kd-11 commented 7 years ago

https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.357 Adds a dependency on fragment shader stage. If it doesnt work, I'll have to reach out to AMD for assistance there as I may have completely misunderstood that part of the spec.

Nezarn commented 7 years ago

@kd-11 still same :(

kd-11 commented 7 years ago

Following amd guy's advice, I've removed the dependency_by_region bit. https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.359

Nezarn commented 7 years ago

@kd-11 still same.

also guy posted about the 100% gpu usage:

Oh, and reg. 100% GPU utilization: assuming you do not use any kind of a CPU-side-based frame limiting solution, that's absolutely fine. After all, wasn't the idea behind Vulkan to squeeze as much juice from the GPU as it's only possible?

So thats not an issue? Then why does it happen only on GCN3 and 4 cards? (even nvidia cards dont have 100% GPU usage)

kd-11 commented 7 years ago

Check if running vulkan demos causes this as well. I suspect something is up with their driver.

Nezarn commented 7 years ago

@kd-11 yep, looks like it happens with anything that uses Vulkan, tried the samples from the SDK, 100% usage, tried running DOOM, 100% usage (even in menu)

kd-11 commented 7 years ago

Their driver is obviously having issues. I'll clean up the vulkan-wip branch until we have no validation issues, then we can continue with AMD support since they insist we must do validation first.

mirh commented 7 years ago

I honestly don't see when utilizing all resources started to become an issue. Isn't both doom and demos supposed to push as many frames as possible?

Assuming you haven't slow cpus that seems totally fine.

RaulDJ commented 7 years ago

@mirh You realize that the emulator runs with almost any card on "idle", right? I don't even get past the ~7% of the TDP of my 1060 with absolutely any game, so similar should happend with the RXs. The emulator uses almost no GPU at all, so this situation right here is obviously not OK.

Nezarn commented 7 years ago

@mirh then how come that on GCN2 cards, and nvidia cards GPU usage is never at 100% on simple stuff? (like the hello world sample on rpcs3)

Another example for normal GPU usage http://i.imgur.com/hAU98jm.jpg

Nezarn commented 7 years ago

@kd-11 looks like DX12 renderer is affected by driver crash too (no BSOD so far), crashes exactly at the same place as Vulkan. (so maybe the offending stuff is in the common code that both renderer uses(?))

Nezarn commented 7 years ago

@kd-11 looks like the graphical issue is a Driver Issue too, what are the chances that it affects 3 emulators (it affects rpcs3 in Vulkan, and affects Cemu and PCSX2 in opengl, tho not as badly)

mirh commented 7 years ago

OGL in pcsx2 is fine (graphically at least) afaik.

Nezarn commented 7 years ago

@mirh In pcsx2 using any kinda Blending Unit Accuracy (aside from none) brings out issue. (http://i.imgur.com/x0rl8C3.png, and similiar issue occurs in Cemu too so something is very broken driver wise. https://www.youtube.com/watch?v=3iHrUSbE8J8 )

mirh commented 7 years ago

Uh.. Is this reproduced here?

Nezarn commented 7 years ago

@mirh i don't see anything wrong on that dump, the problem i have is only visible with 2x native or higher. (maybe on native its so small that it can't be seen, if i set at least 2x on your dump its visible)

edit: also i think we should move our pcsx2 discussion to somewhere else :P (you can contact me on the forum too)

mirh commented 7 years ago

I hope this is fine then.

Nezarn commented 7 years ago

@mirh sure, i hope they fix their drivers, since more and more emulators get affected xD (for example Cemu 1.6.2 is unusable for me, crashes driver :( )

mirh commented 7 years ago

That's "hopefully" pcsx2/pcsx2#1552

Nezarn commented 7 years ago

@mirh yep i hope so (tried that dump and it does crash driver pretty hard (no bsod tho), also would be nice to find something that would reproduce Vulkan\DX12 crash in rpcs3, so they would work on that too...)

mirh commented 7 years ago

Hopefully again whatever OGL fucks with is the same thing Vulkan triggers.

RainKikyou commented 7 years ago

This game also has the same problem,RX 470 3a5c78ec54e736d197cd853c92504fc2d46269fe

ghost commented 7 years ago

Any driver issue, just report it:

https://www.reddit.com/r/Amd/comments/3vse1b/found_an_issue_with_an_amd_driver_report_it_here/

Nezarn commented 7 years ago

@rdeleonp it was already reported multiple times, even the similiar opengl issue that occurs in other emulators, but this Vulkan issue won't be fixed until theres a method to reproduce it without having 3rd party stuff (in this case LLE modules and the game itself).

And even if AMD starts to work on it, it will take at least half year. (just search the AMD forum how long it did take to fix an OpenGL issue)

mirh commented 7 years ago

Well, seems like it took them only two months now.

Nezarn commented 7 years ago

@mirh lol at next driver release, which next? xD (i remember one they fixed something internally, then it took half year until the fix arrived lol) hopefully this will fix Cemu too :D (Cemu is unusable from 1.6.1) and it would be nice if vulkan had the same issue or idk

mirh commented 7 years ago

With the blending issue they had actually claimed the fix was to ship in _a_ future release, not the next one.

AniLeo commented 7 years ago

Can someone retest the specified issues with latest drivers and latest RPCS3 version? I have a GCN2 so can't verify.

Nezarn commented 7 years ago

@AniLeo from a quick test, looks like only BSOD\driver crash remains (100% happens in Project Diva F, in Black★Rock Shooter song)

edit: looking at a youtube video https://www.youtube.com/watch?v=a1XF0kswre0 it happens at 0:44 (when the camera would look at the lamp (it crashes\bsod right before that)

mirh commented 6 years ago

Today AMD open-sourced their linux vulkan driver. This being windows one with some glue, if one wanted I think bugs could be fixed at the source.

https://github.com/GPUOpen-Drivers/xgl