Closed abaire closed 2 years ago
Oh, 20fps? Guess I should speed up my hardware based renderer work soon.
Yeah, hardware rendering would be excellent :)
What I'm seeing is suspiciously steady so I'm wondering if it's a combination of the yield call and vsync causing it to be low. On xemu if I remove the yield it jumps up to ~60 fps, but on hardware it seems to consistently stabilize at ~15 fps. I'll continue to poke around later to see if there's something obvious.
(Also just added #107 to make it easier to watch on hardware)
I think there's something more interesting going on with the threaded performance problem (I can't repro in a trivial test app), moving this to draft.
Did some more debugging and I think this is just thread starvation. My test app failed to reproduce because I was using pbkit, once I switched to using SDL (and left out the yield to simulate more costly rendering), I got the same behavior.
I'm going to drop this PR and instead have a temporary fix that just keeps the scanner on the main thread. It will mean longer startup times (on my XBOX with 160 XBEs to scan it takes ~8 seconds), but that's a much better user experience than having a navigable menu but having to wait ~2 minutes for everything to scan.
I'm optimistic that moving to HW accelerated rendering will allow us to turn threading back on and get the best of both worlds.
Testing on real hardware, I found that when calling
FindFirstFile
on a secondary thread stalls for 20+ seconds (on my v1.0 box). Specifically the underlying call toNtOpenFile
blocks for a substantial amount of time. This seems to be consistent for any (existing) directory, regardless of the content, and changing to an explicitCreateThread
based approach instead of usingstd::thread
did not resolve the problem, nor did changing the sync-related flags on theNtOpenFile
call.Until I can figure out why this is happening, I've converted
XBEScanner
to operate in a cooperative manner. The main loop polls the scanner and gives an approximate time limit for the scan. Theoretically this timeout is based on framerate, but it seemed like I was only getting ~20fps on hardware, so I've made it scan at least one file per frame and set the target to 15 fps.I also added some additional debugging output so it's clear how much time it takes to scan. I suspect users with large libraries will be pretty unhappy with the current performance, so hopefully there's some easy improvements to be had.