Themaister / parallel-rdp

A low-level Vulkan compute emulation of the N64 RDP
MIT License
157 stars 15 forks source link

Mario Tennis stuttering #6

Closed OtavioRaposo closed 4 years ago

OtavioRaposo commented 4 years ago

It happens with me every time I hit the tennis ball. In angrylion the stuttering doesn't occur.

Themaister commented 4 years ago

Every ball hit spawns a ridiculous amount of render passes, this is expected. Mario Tennis is known for this, and needs high sync in Angrylion to work, which also murders performance in general. Depends on your hardware though, I don't get stutter, but just a dip in performance.

OtavioRaposo commented 4 years ago

Yes, but the stutter doesn't happen in angrylion (even in high sync). Shouldn't parallel-rdp be faster for the same hardware?

Themaister commented 4 years ago

Depends on your hardware. GPU acceleration depends on a lot on parallelism, and GPU accel suffers more than CPU suffers under weird scenarios like certain frames of Mario Tennis. You can always construct pathological cases where a CPU implementation beats the GPU one. CPU runs perhaps on ~8 threads, while GPU runs on 10000+ threads.

OtavioRaposo commented 4 years ago

I'm using a core i3-9100f and a RX 570 gpu. It's a pretty balanced gaming hardware. I still don't understand why angrylion is getting better performance than parallel-rdp.

Themaister commented 4 years ago

I don't know how to explain this in simpler terms.

OtavioRaposo commented 4 years ago

It's ok, but unfortunate, because I was looking forward to parallel replacing angrylion as my default emulator. These problems make parallel unreliable for me.

Themaister commented 4 years ago

There are probably some opportunities to specifically optimize for these cases, but it's non-trivial.

Themaister commented 4 years ago

I wrote a simple profiling setup for Granite which lets me debug performance issues like these. Screenshot from 2020-05-24 00-07-47

Mario Tennis is doing a ton of microscopic render passes, each just taking a few tens of microseconds, but the delay between each GPU submit is just killing the GPU when you get a string of ~50 render passes in a row like this. I have some ideas how to optimize for this case now. parallel-RDP is currently optimized for "normal" passes.

mudlord commented 4 years ago

That would be because Mario Tennis implements heat haze and blur on ball movements. Yes, heat haze by render-to-texture.

Themaister commented 4 years ago

Rewrote submission logic and Mario Tennis is essentially fixed now. Don't even notice a dip in performance anymore.

OtavioRaposo commented 4 years ago

Rewrote submission logic and Mario Tennis is essentially fixed now. Don't even notice a dip in performance anymore.

Are these updates already available in the parallel-64 libretro core?

Themaister commented 4 years ago

Part of this PR update: https://github.com/libretro/parallel-n64/pull/664