cemu-project / Cemu

Cemu - Wii U emulator
https://cemu.info
Mozilla Public License 2.0
7.41k stars 608 forks source link

AMD 780M Vulkan renderer leads to corrupted textures #1176

Open Not4ce opened 7 months ago

Not4ce commented 7 months ago

Current Behavior

Enabling vulkan on rdna 3 based cards, such as 780m igpus break pebbles and textures in-game. Also broken shadows that eventually garble and have corrupted textures. Observed in MK8 and BOTW. No graphical or texture enhancements applied, can be replicated on two seperate 780m devices

Vram allocation does not change anything. Only fixed by changing renderer to OpenGL, and seems to be Rdna3/780m specific. Was unable to test on a discrete rdna3 card, no issue on Rdna2 and Nvidia counterparts (6800xt and 3070)

Expected Behavior

Nil graphical issues and not garbled textures

Steps to Reproduce

Enabling vulkan renderer Playing any game: Breath of the wild etc

System Info (Optional)

OS: Windows 11 22631.3374 KB5035942 GPU: 780M on Ryzen Z1 Extreme and 7840u

Emulation Settings (Optional)

Default with no mods or changed

Also Tested with: BCML enabled Second Wind enabled No graphical mods or changes on 1.2.6 and 2.0 cemu

Logs (Optional)

log.txt

Tested with OpenGl that had no issues, followed by switching to Vulkan with issues (UMA frame buffer set to 6gigs on system)

Exzap commented 7 months ago

AMD has a track record of needing 1-2 years to get their drivers up to standard (we have reports like this for every new generation). This combined with the fact that it works on everything else points towards an issue with their driver. However, we don't know for certain unless someone does some digging. I will keep this open for visibility and in case someone with the affected hardware and the necessary know-how wants to look into this.

Exzap commented 7 months ago

Something you can do to help in the meantime is upload a Vulkan validation log. The steps to do this are: 1) Download the Vulkan SDK and install it from here (you can uninstall it afterwards) 2) Restart your PC 3) Open Cemu (use a recent version like v2.0-78), in the menu tick Debug -> Logging -> Vulkan validation layer 4) Trigger the glitch (avoid doing too much other stuff, it makes the log harder to read) 5) Upload log.txt

Not4ce commented 7 months ago

AMD has a track record of needing 1-2 years to get their drivers up to standard (we have reports like this for every new generation). This combined with the fact that it works on everything else points towards an issue with their driver. However, we don't know for certain unless someone does some digging. I will keep this open for visibility and in case someone with the affected hardware and the necessary know-how wants to look into this.

Thank you for getting back. I had a similar line of thought and I found this very strange, as I am unable to replicate this on RDNA 2

As per AMDs own documentation, their vulkan implementation is the exact same on both RDNA 2 and 3. I wish I had a discrete RDNA 3 card to test this, but if a future driver update from AMD remedies the issue, I will update or close this ticket

As it stands, I am disappointed at amd, as all handheld devices use AMD Apus, and I am unable to distinguish if this is an APU driver or RDNA 3 issue

PS: I tried seperate driver versions and ddu with no success. Thank you for keeping an eye on this

Not4ce commented 7 months ago

Something you can do to help in the meantime is upload a Vulkan validation log. The steps to do this are:

1. Download the Vulkan SDK and install it from [here](https://sdk.lunarg.com/sdk/download/latest/windows/vulkan-sdk.exe) (you can uninstall it afterwards)

2. Restart your PC

3. Open Cemu (use a recent version like v2.0-78), in the menu tick `Debug -> Logging -> Vulkan validation layer`

4. Trigger the glitch (avoid doing too much other stuff, it makes the log harder to read)

5. Upload log.txt

The last version I tried which is this was broken

However, 2.0-78 has seemingly fixed the issue! Heres the log attached below with vulkan debugger

What I can conclude is the last version I had downloaded (2.0-46) just did not play nice. Strangely my 6800xt never had issues with any version (including 1.2.6), so its rdna3 driver related bug on older builds log.txt

Image of 1.2.6 corruption:

1 2 6f

Thank you for walking me through this

Not4ce commented 7 months ago

Welp, @Exzap turns out Cemu2.0-78 only fixes the pebbles or greatly reduces the amount of initial corruption. After further playing the game and wind waker, the issue is still present. I have recorded a video for reference and the prior vulkan debug log is still valid

As usual, I'm unable to replicate this on my desktops with rdna2 and ampere cards and can reliably repeat this on any 780m based device like Ally/Win. The only fix is using OpenGL and rolling back drivers doesn't change anything

Kindly let me know if you require further logs or information, kind regards

https://github.com/cemu-project/Cemu/assets/167106518/90821224-c345-43a7-bb59-38f82fcb9135

Squall-Leonhart commented 5 months ago

As per AMDs own documentation, their vulkan implementation is the exact same on both RDNA 2 and 3. I wish I had a discrete RDNA 3 card to test this, but if a future driver update from AMD remedies the issue, I will update or close this ticket

Their Hardware ISA however, is not. RDNA3 has an ISA bug with signedness reinterpretation for example.

More recent findings have come to light regarding the Gfx11 Delta colour changes and their assumption of a general compressed layout for resources + abitrary reinterpretation for DCC introduced somewhat of a perfect storm for graphical misbehavior if assumptions have been made, or if a particular sampler needs a particular layout when attempting to reuse it in a feedback loop.

RDNA2 is affected under very specific conditions, the van gogh in the steam deck for instance has artifacts in WWHD, and the same set of extensions that should resolve RDNA3's texture layout misbehavior resolved similar troubles DXVK had with RDNA2 in games such as GTA4, with RDNA3 no known operations will result in a DCC being disabled, which was not highlighted in the ISA manual but had to be dug out of their PAL or MESA commits. GPU-Open has not been updated for these ISA changes either.

Not4ce commented 5 months ago

As per AMDs own documentation, their vulkan implementation is the exact same on both RDNA 2 and 3. I wish I had a discrete RDNA 3 card to test this, but if a future driver update from AMD remedies the issue, I will update or close this ticket

Their Hardware ISA however, is not. RDNA3 has an ISA bug with signedness reinterpretation for example.

More recent findings have come to light regarding the Gfx11 Delta colour changes and their assumption of a general compressed layout for resources + abitrary reinterpretation for DCC introduced somewhat of a perfect storm for graphical misbehavior if assumptions have been made, or if a particular sampler needs a particular layout when attempting to reuse it in a feedback loop.

RDNA2 is affected under very specific conditions, the van gogh in the steam deck for instance has artifacts in WWHD, and the same set of extensions that should resolve RDNA3's texture layout misbehavior resolved similar troubles DXVK had with RDNA2 in games such as GTA4, with RDNA3 no known operations will result in a DCC being disabled, which was not highlighted in the ISA manual but had to be dug out of their PAL or MESA commits. GPU-Open has not been updated for these ISA changes either.

Hey, thank you for clarifying that and filling in the gaps. I assumed 2 and 3 used the same ISA but that certainly explains the discrepency here

As far as I can tell, all RDNA3 cards are affected including someone I know with a 7900XTX. Seems like its a waiting game for RDNA 3 users which is a shame, as a lot of portables are/will be based on RDNA 3/3.5

dstrnad commented 2 months ago

Just fyi, similar issue was recently fixed in ryujinx and sudachi. Tested on BOTW. Dont know if it is exactly the same issue, but the artifacts/ glitches looked exactly the same in the switch version like they look in cemu. So there might be a way to fix it despite the fact that it is probably RDNA driver related?

Exzap commented 2 months ago

They implemented the VK_EXT_attachment_feedback_loop_layout extension which fixed it. This extension has been on our radar for a while since it would probably also help us improve performance. Unfortunately it's not trivial to implement the extension and since I am bogged down by other work right now I can't work on it anytime soon. But if anyone else wants to take a shot that would be appreciated

For context ryujinx' PR: https://github.com/Ryujinx/Ryujinx/pull/7226

Not4ce commented 2 months ago

They implemented the VK_EXT_attachment_feedback_loop_layout extension which fixed it. This extension has been on our radar for a while since it would probably also help us improve performance. Unfortunately it's not trivial to implement the extension and since I am bogged down by other work right now I can't work on it anytime soon. But if anyone else wants to take a shot that would be appreciated

For context ryujinx' PR: Ryujinx/Ryujinx#7226

Even though you guys cant work on it anytime soon, I appreciate the heads up with this being on your radar. As more gaming handhelds ship with amd apus, this being broken on 24.8.1 still mean the most popular gaming handhelds are locked out (Rog ally, legion go, acer, zotac devices)

But i appreciate the ongoing improvements that go into cemu and the work that contributors put in. I am hopeful this will be resolved one day

goeiecool9999 commented 2 months ago

I've experimented with VK_EXT_attachment_feedback_loop_layout in order to try and fix a different issue on Vega GPU's. Using the extension didn't show any difference in behaviour on that card, but I'm curious if it could fix these issues. If you want to help test it there's a build available here: https://github.com/goeiecool9999/Cemu/actions/runs/10904371233 The implementation is pretty messy at the moment so if it improves rendering there will be more work required to get it in a merge-able state. But at least it could give us an idea about whether or not it would work.

Not4ce commented 2 months ago

I've experimented with VK_EXT_attachment_feedback_loop_layout in order to try and fix a different issue on Vega GPU's. Using the extension didn't show any difference in behaviour on that card, but I'm curious if it could fix these issues. If you want to help test it there's a build available here: https://github.com/goeiecool9999/Cemu/actions/runs/10904371233 The implementation is pretty messy at the moment so if it improves rendering there will be more work required to get it in a merge-able state. But at least it could give us an idea about whether or not it would work.

Hi just tested, and I want to report back, that fork completely solves the issue!

First boot was weird, with app losing 60% of its frametime, but subsequent boots have been flawless. Here are some screenshots:

Botw sample 1 Botw Sample 2

By default, it has "Accurate barriers (vulkan)" enabled under Debug. Keeping that off seem to cause no issue, with enabled costing 2-3 miliseconds in frametime vs off (in action)

Botw Accurate binaries on Botw Accurate binaries off

Overall, I plan on using this build for a longer session after this. Texture corruption from pebbles and all the aliasing seem to be resolved for now. Thank you the heads up, I will be passing this onto some personal acquintances that have been waiting for a cemu resolution, and so far so good

I tried finding a log file, there was none. But if you require any further info/logs, please let me know😊. @dstrnad might also find this useful, if they have a similar use case

dstrnad commented 2 months ago

I can also confirm that this fork seems to completely fix the issue. No black/ colourful glitches anymore. I tested on Rog Ally. Accurate barriers settings on/off makes no difference now, the issue is gone. Good job @goeiecool9999 and thanks for the heads up @Not4ce. I will finally try to play the game now, if I can test anything else just let me know. It would be awesome if this could get merged at some point.

dstrnad commented 2 months ago

Bad news, the fork sometimes crashes when opening or exiting inventory and map menus. Weird is that I can open or close the map/ inventory couple of times (even like 10,20,30 times) and it is ok, but then it randomly freezes the whole game and cemu crashes. I deleted all shader caches to be sure, but still happens.

goeiecool9999 commented 2 months ago

@dstrnad What version are your drivers? There have been lots of reports of instability with 24.8.1 (#1323), although mostly with different symptoms than you're describing. If you are on 24.8.1 and downgrading to 24.7.1 fixes the instability we know it's probably not anything I changed.

dstrnad commented 2 months ago

@goeiecool9999 I’m on 24.3.1. I plan to test it under linux, so we can rule out any windows driver related issues.

TheOneEyedGrimReaper commented 2 months ago

I've experimented with VK_EXT_attachment_feedback_loop_layout in order to try and fix a different issue on Vega GPU's. Using the extension didn't show any difference in behaviour on that card, but I'm curious if it could fix these issues. If you want to help test it there's a build available here: https://github.com/goeiecool9999/Cemu/actions/runs/10904371233 The implementation is pretty messy at the moment so if it improves rendering there will be more work required to get it in a merge-able state. But at least it could give us an idea about whether or not it would work.

where can i download this build? or i need to build it myself?

goeiecool9999 commented 2 months ago

where can i download this build? or i need to build it myself?

@TheOneEyedGrimReaper The interface isn't the most intuitive. At the bottom of the page I linked there's a section called "artifacts". Pick your platform from the list and click the small download icon on the right (edit: clicking the name works too).

TheOneEyedGrimReaper commented 2 months ago

where can i download this build? or i need to build it myself?

@TheOneEyedGrimReaper The interface isn't the most intuitive. At the bottom of the page I linked there's a section called "artifacts". Pick your platform from the list and click the small download icon on the right (edit: clicking the name works too).

lel. so i was the blind. thanks man.

Not4ce commented 2 months ago

@goeiecool9999 I’m on 24.3.1. I plan to test it under linux, so we can rule out any windows driver related issues.

This would be very much appreciated. Sorry for the late reply, @dstrnad is correct and testing on linux would help which i am unable to do

Cemu crashes after 30 minutes of use with exception x0409, which to my understanding is a stack buffer overrun, so something gets corrupted around 20-30minutes. As far as i can tell, the driver may not be the issue. I have tried 6 driver versions so far, ranging from 24.3.1 to 24.8.1 (including 3 seperate devices, and the technical preview drivers from amd)

Screenshot 2024-09-25 134629 Screenshot 2024-09-25 125310

I will also attach the actual crash log here, which i suspect may not help too much? But why not. If you need anything else, please let me know @goeiecool9999. So far, the crashes can happen at any time in shrine/overwold/zone, looks to be related to how long cemu was running

Screenshot 2024-09-25 124925

Cemu crash.zip

goeiecool9999 commented 2 months ago

@Not4ce I'm not very familiar with the windows event viewer and I don't know if it contains useful information. If you could send cemu's log.txt after a crash instead that's more likely to contain a helpful clue. I'm mainly interested in seeing the stack trace at the end of the file.

TheOneEyedGrimReaper commented 2 months ago

@goeiecool9999 Hi again! Can you check your cemu fork?

It tries to eat more than my actualy allocated vram and then freezes up my rog ally completely.

goeiecool9999 commented 1 month ago

it tries to eat more than my actualy allocated vram and then freezes up my rog ally completely

I didn't change anything related to GPU memory allocation. Are you sure you're running out of VRAM and not regular memory?

Valkyr2 commented 1 month ago

Hi, tested your build, works like a charm for me. I'll test it further this week. thanks for sharing this

TheOneEyedGrimReaper commented 1 month ago

it tries to eat more than my actualy allocated vram and then freezes up my rog ally completely

I didn't change anything related to GPU memory allocation. Are you sure you're running out of VRAM and not regular memory?

yep i'm sure of it. somehow it sees my 6gb allocated vram for 10gb and when it's going over 6gb it freezes my rog ally. the official cemu build that has the ground texture bug doesn't even try to eat more than 3gb of vram and i can play for hours without any other problem than the ground tex bug.

goeiecool9999 commented 1 month ago

I found a memory leak and fixing it appears to make GPU memory usage stable. Let me know if this improves stability. New build here: https://github.com/goeiecool9999/Cemu/actions/runs/11482812478

dstrnad commented 1 month ago

I think you fixed it. I’ve been testing for an hour on linux (bazzite, Rog ally) and memory seems stable, no crashes so far and no glitches. I will test more, but good job 👍

aASDa213ASD commented 1 month ago

@goeiecool9999 Hey, wanted to pay my little thanks here, quite impressed with the work you've done! Apart from a massive perfomance hit it solves the issue with artifacts on RDNA3 gpu (RX 7800 XT in my case) completely. I'd like to ask you and not just you but perhaps everybody a question regarding Vulkan vs OpenGl war. I can clearly see that by default I get 20 frames per second less on your fork, don't really care about barriers as long as artifacts are not there. However I seem to be running Opengl just fine as well on 160 frames (even higher than on vulkan for some reason), so may anybody explain to me why we prefer vulkan over opengl at this point? Or I just happened to be the lucky guy getting more frames on opengl than on vulkan with my gpu? What's.. the catch here?

All the tests were made on 2560x1440p with FPS++ on 165.0 lock. Worth noting that OS is Arch Linux, tested both amdvlk / radeon-vulkan drivers (no difference at all).

Here's my frametime on Vulkan with accurate barriers disabled: image And all the same but accurate barriers enabled this time (may feel like the issue is entirely gone as well, but trust me it's still there, just less noticeable): image Now your cemu fork with feedback_loop_layout with accurate barriers disabled: image And the same but with accurate barriers enabled: image OpenGl (please explain to me why it happens to be more performant): image

Squall-Leonhart commented 1 month ago

Because prior to RDNA2 the legacy ATI Opengl ICD still in use was not optimized for multicore and was 50-75% slower than it is now , it is still not as capable as vulkan for emulating all aspects of the Latte gpu with just AMD's extension set, as AMD implements them as specc'd while nvidia lets the developer get away with using them in ways less defined.

With AMD's design mistake decision in RDNA3, the only resolution means the renderer falls back to cpu limited synchronisation operations more often.

The hit is likely to be worse on nvidia since layout transitions there are known to cause 15% hits.

goeiecool9999 commented 1 month ago

However I seem to be running Opengl just fine as well on 160 frames (even higher than on vulkan for some reason), so may anybody explain to me why we prefer vulkan over opengl at this point?

Worth noting that OS is Arch Linux

Historically AMD's official OpenGL drivers on windows performed very poorly. On Linux however mesa is actually one of the best drivers out there and was one of the recommended solutions for AMD's bad performance. Though installing a different operating system is not convenient for most end-users as you might imagine so Cemu needed a different solution. Vulkan.

The tradeoff between OpenGL and Vulkan is essentially this: OpenGL is easier to develop applications with because the API is a lot less complex. This simplicity also means the driver does a lot of things behind the scenes without telling the developer. You just have to cross your fingers that the implementation is efficient. This leads to unpredictable performance and makes performance problems harder to debug. Vulkan is intended to take away a lot of that guesswork and make everything explicit. That's why Vulkan is notoriously verbose to get anything done. Even displaying a single triangle. The developer has to decide a lot more things. How to synchronize, when to send work to the GPU, etc. So why does OpenGL perform better in some cases? Because it receives a minimum amount of information required to define what needs to be rendered on the GPU and is free to optimize it however it likes. If a driver does this well performance is good (mesa). If it doesn't performance is bad (AMD's windows driver). With Vulkan you get a bunch of predictable operations that you have to make efficient use of. The only optimization the driver can do is make those individual operations as efficient as possible. When Cemu's Vulkan backend was written it provided significantly better performance than AMD's OpenGL driver (and probably still does, idk it's been a while since I compared) meaning AMD users don't need to switch to a different OS anymore. Cemu's Vulkan backend also supports asynchronous shader compilation meaning it can prevent stutter that normally happens when compiling uncached shaders by temporarily skipping the associated draw calls until they're done compiling. This is apparently not impossible to implement on OpenGL, but I've heard it's a lot easier with Vulkan. You can use whichever backend you prefer. Although in recent years the Vulkan backend has received more attention than OpenGL.

Another benefit of Vulkan which doesn't matter at all for cemu is the ability to prepare work for the GPU on multiple threads. That means that in theory a well designed game engine for example can use high core count systems more effectively. OpenGL is single-threaded by design.

aASDa213ASD commented 1 month ago

That answers pretty much explains all in and outs regarding the question. I know that Vulkan is successor to OpenGL and it SHOULD (in theory) perform much better than OpenGL and was quite surprised to see the other picture, much obliged for the explanation why that even happens to me.

By walking left-n-right on the map a little hitting a bunch of monsters I can tell that OpenGL struggles to render a lot of stuff at a time since my framerate drops to 90 from 165 on any explosion (even with pre-downloaded shaderCache) which lowkey corresponds to single-threaded nature of it.

In any case you did a great job and I couldn't be happier, awaiting your work to be implemented in the main branch.

Not4ce commented 1 month ago

I found a memory leak and fixing it appears to make GPU memory usage stable. Let me know if this improves stability. New build here: https://github.com/goeiecool9999/Cemu/actions/runs/11482812478

Can confirm works. It fixed the game crashing circa 30 minutes in, which indeed seemed to be the a memory leak as @TheOneEyedGrimReaper was thinking. One hindsight, a clear sigh was degraded game frame rate over longer duration of play in the same zone

Thank you for the hard work! Much appreciated. Game was fine on a 1 hour game session of second wind on my Ally

Performance is unchanged, just more stable now with no slowdown half an hour in. I tested across Rdna 2 and 3 cards and 4 devices and 3 driver version. Really hope these fixes are merged into the primarily stable build to benefit more users that are unaware of this

aASDa213ASD commented 1 month ago

Had the game running for about 10 hours in total, can confirm that there's no crashes no matter what I'm busy doing here. Except one single exception (not sure if it doesn't appear in the normal cemu as well): if you remove framerate limit and bring up some menu (like bomb selection) your fps will jump to insane numbers effectively overloading something to the point that renderer stops responding and the game will be frozen until restart. Doesn't seem to happen if any framerate limit supplied by either FPS++ or any external stuff like Mangohud / RTSS / Etc. Again thanks @goeiecool9999 for the dedication and doing my job for me, as stated waiting @Exzap here to take a look and merge it, 20 frames drop was totally worth it, I hate OpenGL.

Regarding OpenGL it even produces some interesting visual glitches that Vulkan doesn't: image

Same location on Vulkan (no settings changed): image

goeiecool9999 commented 4 weeks ago

I've made some more changes. This build might improve performance (but temper your expectations). I'm interested to hear if it does. https://github.com/goeiecool9999/Cemu/actions/runs/11602523437

Not4ce commented 3 weeks ago

I've made some more changes. This build might improve performance (but temper your expectations). I'm interested to hear if it does. https://github.com/goeiecool9999/Cemu/actions/runs/11602523437

Well, the results are very interesting to say the least

Device: Z1E and Desktop with 12600K + 6800xt Drivers: 24.9.1 and 24.10.1 Windows: 23H2 Data: HML logs with afterburner + general gameplay

As far as testing methodology goes, I used a set-path that includes BOTW's internal zone change + heavy load, moving between Gerudo to central Hyrule over the course of 5 minutes. The same settings. tdp, gpu clocks, resolution and time of day was used to minimise variables, with Fps++ to uncap to max refresh rate. For the numbers below, I waited for shaders to compile on the set path and did 3 runs afterwards taking their average

Cemu 2.2 stable + unchecked updates:

2 2 Cemu 2 2

Heres the stutter:

2 2 stutter

The stutter is from my 2nd run. The initial run had 3 or so 450ms pauses. This is expected since the cpu budget is exceeded, but the new experimental build seems to avoid or minimise this somehow

Cemu 5640d58 (new build)

Cemu 5640d58

Frametimes:

Cemu 5640d58 frametimes

Screenshots and data were from Ally Z1E at 25w of tdp. 5640d58 is 3-5% slower when cpu bound but offers better frametimes compared to 2.2 stable. The frame-time delta itself is very good on 5640d58, only deviating by 15% at worst case. With VRR, it eliminates any perceptible judder or compilation hitches as long as the baseline frametime is below 20ms. I was hard pressed to find any +5ms variation at all, which visually presents as "slow down/hitching"

Curious whats different in this build, and if this benefit extends to Nvidia gpus. As is, having slightly higher framtimes, but stuttering is a worse experience. I would always go for a build that is smooth in motion, such as: 5640d58. Both builds are pretty good, but 2.2 reminds of a fixed 1.2.6 build. I will be sticking with 5640d58 on my desktop and Ally

goeiecool9999 commented 3 weeks ago

I don't understand your results.

  • Looks like the fix was ported to the main branch

I'm not aware of any change between 2.0-78 and 2.2 that can explain it being fixed, nor can I explain why my branch has less stutters. There's no reason for performance to be improved between my builds and 2.2. If anything it should be worse. I was mainly interested in a performance comparison between the latest build I linked and the one before as made some changes that I thought might reduce the performance impact.

Squall-Leonhart commented 3 weeks ago

Looks like the fix was ported to the main branch

its visible in your screenshot that is not.

dstrnad commented 3 weeks ago

I don’t see any release or pull request for the main branch either, but honestly, I can’t spot the issue in his particular screenshot. However, the issue is so noticeable in the game that I doubt @Not4ce wouldn’t have noticed it

Not4ce commented 3 weeks ago

Looks like the fix was ported to the main branch

its visible in your screenshot that is not.

Correct, it says 2.2 but i dont know how the newer naming nomenclature works post 1.2.6. If the build wasnt updated, then its AMDs driver update 24.10.1 from a week ago that fixed the issue. I have exlusively used the "modified" build from goeiecool till now.

I don’t see any release or pull request for the main branch either, but honestly, I can’t spot the issue in his particular screenshot. However, the issue is so noticeable in the game that I doubt @Not4ce wouldn’t have noticed it

Yup, looks like 24.10.1 fixes it on Windows. I recall you using Linux, I wonder if there are any changes there on the Mesa drivers front

I don't understand your results.

  • Looks like the fix was ported to the main branch

I'm not aware of any change between 2.0-78 and 2.2 that can explain it being fixed, nor can I explain why my branch has less stutters. There's no reason for performance to be improved between my builds and 2.2. If anything it should be worse. I was mainly interested in a performance comparison between the latest build I linked and the one before as made some changes that I thought might reduce the performance impact.

Results are repeatable and I have attached the logs as 7zip files below. 2.2 related bug seems to be resolved due to the driver update this month from AMD. If others can test and chime in on Windows, that would help, esp re:stutter on 2.2

Cemu 5640d58 seems to be 20%~ faster on some scenes compared to your old build 1211a31 with memory fix. Running upto Kakariko village 5640d58 can maintain 51fps vs 39fps on 1211a31. Generally, over a 8 minute like for like run, with my earlier described methodology, the overall uplift is closer to 15% on average.

After playing hours of cemu to compare builds across 3 devices, it seems clear on my systems that 5640d58 simply stutters less. I will attach the afterburner .hml logs below

Stable 2.2 : 10 stutters over 8 minutes with 600ms pause (207ms on average) image

Cemu 5640d58: 6 Stutters over 8 minutes averaging 160 ms image

Stable stutters almost twice as often, with most hitches being north of 200ms and some terrible 600ms hitches. As far as average fps goes

2.2 - Avg: 59fps, 0.1%: 27fps 5640d58 - Avg: 51fps, 0.1%: 39fps 1211a31 - Avg:40fps, 0.1%: 27fps

At this point, im very curious as to why stable 2.2 reliably stutters more. I do not plan on updating to 24h2, till Microsoft addresses Alderlake performance degradation, so I cant say if this is 23h2 specific.

I am curious if other windows users notice this discrepency between the builds but whatever the case, the experimental build seem to be very smooth on my end/amd hardware compared to stable. The logs are below this, which ran through a set-path, fixed time of day, and so on

Cemu Test 2.zip

Valkyr2 commented 3 weeks ago

Hello, thanks for the build I did some testing, here is what I got.

and i can confirm that somehow the black squares texture corruption on RDNA3 does no longer occur at all on the 2.2 version recently updated graphics driver to 24.10.1, perhaps it has something to do with it

Squall-Leonhart commented 3 weeks ago

nothing is fixed in 24.10, if anything its just as broken as 24.9

Not4ce commented 3 weeks ago

nothing is fixed in 24.10, if anything its just as broken as 24.9

Can you provide an example. Due to the 10mb upload limit here, heres a catbox link of 2.2 running fine on 24.10.1 on my end. There are no texture corruption on other zones either, seems to be resolved on my end with the latest drivers

https://files.catbox.moe/6apabw.mp4

aASDa213ASD commented 3 weeks ago

@Not4ce

Cemu

Latest Cemu 2.2 downloaded from here, texture artifacts are there no matter what.

PC Specs

OS: Arch Linux x86_64 Kernel: Linux 6.11.5-zen1-1-zen MOTHER: MAG B550 TOMAHAWK (MS-7C91) (2.0) ​CPU: AMD Ryzen 7 5800X3D (16) @ 4.55 GHz GPU: AMD Radeon RX 7800 XT (amdgpu) Memory: 9.88 GiB / 31.27 GiB (32%)

No accurate barriers

image

Accurate barriers

image

Fun fact

I tried to screenshot my artifacts and even got a FRAME where I can see NO ARTIFACTS means they are not there every single frame, but rather appearing and vanishing constantly. Current fps is locked to 120.

goeiecool9999 commented 3 weeks ago

At this point, im very curious as to why stable 2.2 reliably stutters more

I haven't been entirely scientific. I merged some changes from main that happened after the release of 2.2. So it could be one of those as well. Anyways, if the new driver really does make the problem go away that's preferable because like I said before, I'm not very happy with the implementation of my fix using the extension. I would probably want to rewrite it to get it in a merge-able state.

aASDa213ASD commented 3 weeks ago

@goeiecool9999 Regarding difference between your previous build and a new one

dcdfa82

image

5640d58

image

Summary

On average I'm getting around 5 frames more than I did on previous build, sometimes dcdfa82 jumps to 160 as well but it doesn't happen to keep that number and drop to 145-150-155 while 5640d58 is able to jump even higher to 163 and keep itself around 155 stable without random drops to 145 for 0 reason. Good job I guess?

goeiecool9999 commented 3 weeks ago

The performance penalty for the image memory barriers seems to be far less with mesa compared to windows. I would say that tiny of a difference is probably not statistically significant. The difference that Valkyr2 reported is more in line with my expectations.

aASDa213ASD commented 3 weeks ago

Anyway I assume that I'm cpu-necked so it's not like my renderer is overloaded with anything, GPU is under 50% load while couple of my cpu cores go up to 100%. Maybe that's why I can't see much of a difference. If that would really help I could test windows as well in some future.

goeiecool9999 commented 3 weeks ago

The term bottleneck really only applies when you're dealing with parallel processes where one process depends on the output of another (or a chain of processes). The CPU and GPU are effectively parallel processes. While the GPU is processing one frame the CPU can work on the next frame simultaneously. This means that a frame can take up to 1/fps seconds to process on the CPU AND 1/fps seconds to process on the GPU before you see stutters. Like with all pipelines there's a latency increase but the performance improvement of having both processors do work simultaneously is worth it in the vast majority of cases. If the GPU is the bottleneck (assuming no FPS cap) it would be active 100% of the time and the CPU would be idle some percent of the time waiting for the work queue to have space for the frame it just processed. If the CPU is the bottleneck the CPU is active 100% of the time and the GPU idles some of the time waiting for the work queue to be non-empty. You're gonna think: "Well then obviously I'm CPU bottlenecked in Cemu". But it's not that straightforward because Cemu's main CPU thread busy-waits (I believe to avoid unpredictable scheduler latency or unnecessary core migrations). So you won't see the true idle time in monitoring tools. The main takeaway is this: For ideal performance and for the term "bottleneck" to be meaningful the CPU and GPU should only wait for each other when the GPU work queue is either empty or full.

I'm going off of what others tell me but Breath of the Wild's engine on Wii U is designed in such a way where the CPU waits for the GPU to be completely idle somewhere during the frame. That means that the GPU will always be idle for some time during the frame until the CPU submits more work, no matter how fast or slow the CPU or GPU are. So even if the GPU was a potato you would not see 100% utilisation. You can improve the time between the CPU resuming when GPU is idle and the CPU submitting more work by getting a faster CPU, but you can also decrease the amount of time the CPU has to wait for the GPU to become idle by getting a faster GPU. Without measuring which of these delays is longer it's difficult to tell which component would give the biggest performance improvement (and even when you do it might not be obvious).

To tie it back to my point about the term "bottleneck": Normally if one component is a bottleneck upgrading the other wouldn't make a difference. In this scenario upgrading either can have a positive impact. So neither can be said to be a true bottleneck. I don't know why I decided to type this out since you didn't ask, but maybe you find it interesting (or someone else).

(PS: I think GPU utilisation actually becomes a proxy for those delays I mentioned earlier. As the GPU approaches infinite speed GPU utilisation approaches zero. As the CPU approaches infinite speed GPU utilisation approaches 100%. So if GPU usage is 50% the CPU and GPU are evenly matched, when it's at 75% that means it's slower and may be worth upgrading. if GPU usage is 25% you might consider upgrading the CPU. I'm not sure. That part might be blatant misinformation :sweat_smile:)

aASDa213ASD commented 3 weeks ago

@goeiecool9999 @Not4ce @Squall-Leonhart @Valkyr2 I'm back with some.. Let's just say I'm not sure if this is good or a bad thing that I'm about to report. First of all thanks @goeiecool9999 for writing this all down for me, would never know Cemu works this way.

I'm now writing from my Windows 11 Pro 24H2 (OS Build 26100.2033) instance on the same exact PC, specs of which you can see above in this message. My graphics driver is shown on AMD Adrenaline Software as follows (24.10.1 10/11/2024): image

This exact driver has nothing to offer regarding Vulkan emulation issues on it's release notes which you can find here.

As well I downloaded Cemu 2.2 from latest releases again from here which also doesn't have any fixes regarding Vulkan emulation.

You already feel where I'm leading, right?

Not a single artefact problem on Cemu 2.2 on Windows 11 24H2 with 24.10.1 AMD GPU driver: Untitled

On Linux, on the other hand Mesa and Vulkan-Radeon packages got updated recently so I checked that out as well - no changes I'm still artefacting. image

Both Cemu are set up identically on both machines with the same resolution, graphics packs, etc. I've Above 4GB decoding and Resizable bar features enabled on my bios if that matters. Worth noting that I've played for almost an hour running around searching for anything related or similar to what I've used to see on Cemu 2.2 - nothing. Not a single artefact was found. Not a single visual glitch. The only difference that really concerns me is that my Cemu title says [Vulkan] [Generic] on Linux while it shows [Vulkan] [AMD GPU] on Windows.

Here's my latest installed packages on Linux:

[aasda@arch ~]$ sudo pacman -Q | grep mesa
lib32-mesa 1:24.2.6-1
libva-mesa-driver 1:24.2.6-1
mesa 1:24.2.6-1
mesa-utils 9.0.0-5

[aasda@arch ~]$ sudo pacman -Q | grep vulkan
lib32-vulkan-icd-loader 1.3.295-1
lib32-vulkan-radeon 1:24.2.6-1
vulkan-headers 1:1.3.295-1
vulkan-icd-loader 1.3.295-1
vulkan-radeon 1:24.2.6-1

Least to say - I'm confused. I have tons of questions, perhaps everyone here as well. If anyone has a single assumption or even a clue about why that works/doesn't work and what exactly affects it - please let me know because it drives me nuts when I don't understand something. If there's anything more I can provide/test/experiment with - let me know as well, will def do. Want this problem to be gone for good and we are getting close.

Squall-Leonhart commented 3 weeks ago

it's worth noting (and something i've seen in Switch emulation too) that once a pipeline is cached in a good state, artifacts that occured on a bad build may not occur with that bad build after a good build has updated the cache.