Closed Nezarn closed 6 years ago
I have some idea why there are issues and will update this ticket with testing information when time allows.
https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.342
Our pipeline barriers are crap, but there's alot of confusion about how the pairings are supposed to work. However, using TOP_OF_PIPE/BOTTOM_OF_PIPE to flush writes just makes no sense. I'll review this part of the spec when I have time and come up with a better solution rather than something off the top of my head.
This fixes alot of flickering visuals on AMD R9 200 series as well, so Its a step in the right direction I hope.
@kd-11 Just tried this build, didn't fix anything yet for me sadly. Also if you don't know something, why don't you just ask for advice on AMD forums? Devs could help.
The confusion includes members from GPU vendor themselves as well as khronos. See https://github.com/KhronosGroup/Vulkan-Docs/issues/128
After going through the spec for a few minutes, I believe I understand what the spec implies, but even from the thread above, you can see that examples given are often wrong.
BTW If you are still crashing, something else is probably wrong. The remaining visual corruptions should (in theory) be fixed though by that build. On my 200 series, I was getting flickering textures and green color in some games that is gone. The crash may have nothing to do with synchronization in that case. I've been experiencing crashes on my GPU replaying vulkan renderdoc and that raises a red flag since inspecting the code shows nothing suspicious, except that we always crash during a vkCmdPipelineBarrier call.
If the graphical issue is unchanged, It might be because the stages in the barrier do not account for changes to and from LAYOUT_GENERAL which we use during buffer clears. I'll update and we can try again when I find the time.
I also realized that I failed to implement part of the spec dealing with presentable images so its likely my fault here.
@kd-11 its a little bit funny when the devs themselfs doesn't know how to use their stuff XD
Also yes, the driver crash is still there (well i tried it once, crashed at the same place, but at least this time it wasn't a BSOD (BSOD is a bit random, like 40% chance for BSOD, 60 for driver crash)) and the GPU usage and graphical issue is the same. (also the games that are affected by the graphical issue has a "flash" when the vulkan window opens) http://i.imgur.com/5AlNU76.png https://www.youtube.com/watch?v=lBUKw9_uKlY
Looks like a collision on presentable images in your case. Try https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.354
If that does not work I'll have to write a barrier manually for the transition just before present. This approach fixed corrupted overlays before. By the way, try enabling the debug overlay and see if it helps. It is known that AMD GPUs rasterize from top left to bottom right (you can find the research online) so its no surprise that corruption happens just when the frame is about to finish copying the image and artefacts become visible at the bottom.
@kd-11 yep it still happens, here is the log with debug output (there are a lot of warnings) https://www.dropbox.com/s/epxqxvm1km6540b/vulkanlog.zip?dl=0
What if you enable the debug overlay? It adds another stall before presenting.
@kd-11 its the same http://i.imgur.com/5Jl7VGi.png
@kd-11 AMD replied on the forum https://community.amd.com/thread/206464
So basically they can't do anything, since he would need to own the game. (and he says he can only look into it if no third party download is needed....)
That sucks, but its kinda expected. Until rpcs3 can replay renderer state, this will be a difficult one to debug. However, the artifacting issue is visible even in renderdoc and they should be able to help with that one. GPUPerf studio also supports API tracing just no visual output so they can help with that at the very least.
@kd-11 yep, oh well, i hope they can at least help with the graphics issue and with the 100% GPU usage... Just posted renderdoc + video on their forum
@kd-11 so AMD guy responded:
It appears that the glitches are introduced in colour passes 7-11, where the app appears to be doing a blur. The biggest problem that's jarring from the trace you provided is that your application is executing a lot of renderpasses without defining external dependencies. This can lead to corruptions as the ones we're seeing because of the fact the GPU is free to run the commands in an overlapping manner which may lead to RAW hazards. For performance reasons, also please consider coalescing the huge number of renderpasses your application is using right now, so that the draw calls are embedded in subpasses.
The subpasses cannot be done since we have no way of knowing beforehand how the calls will be submitted to the rsx, and the RAW hazard is a known issue, which is why I've been working on the memory barriers. Subpass dependancies are memory barrier type ops so at least we were on the right track. There is an external dependancy that we added, flushing previous color output before the current color output stage, but I guess we can add one to block memory read on fragment shader as well.
https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.357 Adds a dependency on fragment shader stage. If it doesnt work, I'll have to reach out to AMD for assistance there as I may have completely misunderstood that part of the spec.
@kd-11 still same :(
Following amd guy's advice, I've removed the dependency_by_region bit. https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.359
@kd-11 still same.
also guy posted about the 100% gpu usage:
Oh, and reg. 100% GPU utilization: assuming you do not use any kind of a CPU-side-based frame limiting solution, that's absolutely fine. After all, wasn't the idea behind Vulkan to squeeze as much juice from the GPU as it's only possible?
So thats not an issue? Then why does it happen only on GCN3 and 4 cards? (even nvidia cards dont have 100% GPU usage)
Check if running vulkan demos causes this as well. I suspect something is up with their driver.
@kd-11 yep, looks like it happens with anything that uses Vulkan, tried the samples from the SDK, 100% usage, tried running DOOM, 100% usage (even in menu)
Their driver is obviously having issues. I'll clean up the vulkan-wip branch until we have no validation issues, then we can continue with AMD support since they insist we must do validation first.
I honestly don't see when utilizing all resources started to become an issue.
Isn't both doom and demos supposed to push as many frames as possible?
Assuming you haven't slow cpus that seems totally fine.
@mirh You realize that the emulator runs with almost any card on "idle", right? I don't even get past the ~7% of the TDP of my 1060 with absolutely any game, so similar should happend with the RXs. The emulator uses almost no GPU at all, so this situation right here is obviously not OK.
@mirh then how come that on GCN2 cards, and nvidia cards GPU usage is never at 100% on simple stuff? (like the hello world sample on rpcs3)
Another example for normal GPU usage http://i.imgur.com/hAU98jm.jpg
@kd-11 looks like DX12 renderer is affected by driver crash too (no BSOD so far), crashes exactly at the same place as Vulkan. (so maybe the offending stuff is in the common code that both renderer uses(?))
@kd-11 looks like the graphical issue is a Driver Issue too, what are the chances that it affects 3 emulators (it affects rpcs3 in Vulkan, and affects Cemu and PCSX2 in opengl, tho not as badly)
OGL in pcsx2 is fine (graphically at least) afaik.
@mirh In pcsx2 using any kinda Blending Unit Accuracy (aside from none) brings out issue. (http://i.imgur.com/x0rl8C3.png, and similiar issue occurs in Cemu too so something is very broken driver wise. https://www.youtube.com/watch?v=3iHrUSbE8J8 )
@mirh i don't see anything wrong on that dump, the problem i have is only visible with 2x native or higher. (maybe on native its so small that it can't be seen, if i set at least 2x on your dump its visible)
edit: also i think we should move our pcsx2 discussion to somewhere else :P (you can contact me on the forum too)
@mirh sure, i hope they fix their drivers, since more and more emulators get affected xD (for example Cemu 1.6.2 is unusable for me, crashes driver :( )
That's "hopefully" pcsx2/pcsx2#1552
@mirh yep i hope so (tried that dump and it does crash driver pretty hard (no bsod tho), also would be nice to find something that would reproduce Vulkan\DX12 crash in rpcs3, so they would work on that too...)
Hopefully again whatever OGL fucks with is the same thing Vulkan triggers.
This game also has the same problem,RX 470
Any driver issue, just report it:
https://www.reddit.com/r/Amd/comments/3vse1b/found_an_issue_with_an_amd_driver_report_it_here/
@rdeleonp it was already reported multiple times, even the similiar opengl issue that occurs in other emulators, but this Vulkan issue won't be fixed until theres a method to reproduce it without having 3rd party stuff (in this case LLE modules and the game itself).
And even if AMD starts to work on it, it will take at least half year. (just search the AMD forum how long it did take to fix an OpenGL issue)
@mirh lol at next driver release, which next? xD (i remember one they fixed something internally, then it took half year until the fix arrived lol) hopefully this will fix Cemu too :D (Cemu is unusable from 1.6.1) and it would be nice if vulkan had the same issue or idk
With the blending issue they had actually claimed the fix was to ship in _a_ future release, not the next one.
Can someone retest the specified issues with latest drivers and latest RPCS3 version? I have a GCN2 so can't verify.
@AniLeo from a quick test, looks like only BSOD\driver crash remains (100% happens in Project Diva F, in Black★Rock Shooter song)
edit: looking at a youtube video https://www.youtube.com/watch?v=a1XF0kswre0 it happens at 0:44 (when the camera would look at the lamp (it crashes\bsod right before that)
Today AMD open-sourced their linux vulkan driver. This being windows one with some glue, if one wanted I think bugs could be fixed at the source.
I just make this issue so it will be easier to track when things gets fixed driver or rpcs3 wise. (its easier to have these issues in one place, since if i post it in some game's issue where its happening, it gets buried if theres a lot of comments :P)
Current issues: - BSOD\Driver crash in certain games (100% reproducable in Project Diva F, reported to AMD)
- Always 100% GPU usage (http://i.imgur.com/Kg0hxul.png vs. http://i.imgur.com/wxEM8bU.png)- Unique graphical issue(s) (http://i.imgur.com/a8LTaz0.png (look at the bottom of the pic))If this issue isn't needed then feel free to close\delete.