Rebzzel / kiero

Universal graphical hook for a D3D9-D3D12, OpenGL and Vulkan based games.
MIT License
1.01k stars 217 forks source link

Dx11 Present hook spikes GPU usage #6

Closed lmoe closed 5 years ago

lmoe commented 5 years ago

First of all thank you very much for your library. I use to do the same around ~6-8 years ago and forgot most of it, writing my own base seems overkill for my tiny project, so I happily use yours.

It works as expected and had a nice start, however I just found out that it spikes my GPU usage up to 25% higher than without the hook. Currently I'm trying to figure out why because I've never experienced it before.

Back then I've used a Findpattern/Detours way for CS:Source, and the detour functions used to be rather small. Looking into Minhook I see a lot of instructions there.

So my current guess lies on Minhook. But I would also expect I did something really wrong.

I've tested it out on this sample Dx11 application: https://www.3dgep.com/introduction-to-directx-11/ (Binary on the bottom)

And to save time, used this injector: https://github.com/DarthTon/Xenos/

Running the application results in a typical usage of ~38-41% CPU / ~50-52% GPU on a i5-4690/1080GTX.

With a running hook that just calls back the original function I have a result of around: 40-43% CPU (which is neglible) and up to 78% GPU.

I've removed most of my own code out of my sample: https://pastebin.com/bq4YnJH9

I've found out that switching from stdcall to fastcall seems to improve performance just a bit, Compiling in release mode does not.

I'm currently tinkering with a different hooking mechanism and different types of diagnostic. Time to open up IDA again. But maybe you have some input on how to improve the performance. Maybe me guesses are plain wrong. I wouldn't be surprised at all.

Thanks in advance!

/Edit:

After switching the library to Detours 4.0 (https://github.com/microsoft/Detours) I was able to bring the CPU back down to constant 38%, the GPU now stays pretty smooth at ~66% so that's something. Still don't understand the problem at all. It's not like your library mines Bitcoin. I seem to miss something very fundamental.

lmoe commented 5 years ago

So. I have some results. I've learned a big lession today. Don't use cheap Dx11 demos for hook performance tests. It turns out that the demo is calling Present ~8000 times per second. As the detour places a jmp, and my hook jmp's back into the original Present function, it's safe to estimate ~24000 calls per second just for the render function alone. I've also assumed that this overhead would just put load on the CPU, because the real Present function still only gets called 8000 times like before. Well that a mistake as well.

I've just tested a real demanding game to verify, and it turns out that neither Detours 4, nor Minhook had any considerable impact at all. Who would have guessed. However, Detours seemed to handle the load/overhead a bit better than Minhook as the GPU load was ~10% less.

Sigh. Talking about fundamentals. It was basically right in front of my eyes. Maybe this is a lession for someone else too.