DescentDevelopers / Descent3

Descent 3 by Outrage Entertainment
GNU General Public License v3.0
2.88k stars 251 forks source link

[Runtime Issue]: Framerates in outdoor areas drop to half to third of indoor areas and spikes GPU usage. #560

Open 0xFADDAD opened 2 months ago

0xFADDAD commented 2 months ago

Build Version

2db85ca6ecd46ff31035f59cd409477e81d4e0e7

Operating System Environment

CPU Environment

Game Modes Affected

Game Environment

No response

Description

Looking at skybox with no terrain in view or returning to indoor areas returns framerate to normal. Possible terrain is not being culled?

Regression Status

No response

Steps to Reproduce

Enter outdoor area, framerate halves and GPU usage triples.

https://github.com/user-attachments/assets/4c99ac6a-5288-47c6-8161-175733dad533

tophyr commented 2 months ago

Thanks for the report. @0xFADDAD, if you are able, would you mind re-running the test with a build from https://github.com/DescentDevelopers/Descent3/actions/runs/10566626808 ? This will help determine if this problem is caused by the recent renderer modernization or if it was introduced by something prior. Thanks!

0xFADDAD commented 2 months ago

https://github.com/DescentDevelopers/Descent3/actions/runs/10566626808 actually managed to work even worse, framerate's are now down in the high 'teens.

0xFADDAD commented 2 months ago

I'm glad I decided to try all the levels in the Bedlam set. The last level, 'Polaris', has outdoor areas, but these render mostly correctly, 5 to 10% frame cut or so. 'Plutonium' and 'Apparition' have the severe framerate cuts. I'll try a few more levels with outdoor sections to see if I can make out a pattern.

UPDATE: Dementia's 'Geodomes' is a good test for just how low the framerate can get. The terrain has a few sections of long flat surfaces in the far distance that really crater the performance.

tophyr commented 2 months ago

https://github.com/DescentDevelopers/Descent3/actions/runs/10566626808 actually managed to work even worse, framerate's are now down in the high 'teens.

Oh wow! I'm glad I asked. We can rule out the modernized renderer as a cause then, at least.

KynikossDragonn commented 2 months ago

I've experienced the same problem running off of git main builds, I haven't measured the GPU usage through intel_gpu_top on my NUC but the CPU usage is extremely high and according to htop the brunt of it is kernel time.

Is there a way to profile what's happening and try to narrow down what's causing the thrashing in rendering?

Lgt2x commented 2 months ago

I've experienced the same problem running off of git main builds, I haven't measured the GPU usage through intel_gpu_top on my NUC but the CPU usage is extremely high and according to htop the brunt of it is kernel time.

Is there a way to profile what's happening and try to narrow down what's causing the thrashing in rendering?

We'll need more precise CPU profiling to identify and mitigate bottlenecks. I recommend running the perf (record) set of tools on Linux to get precise CPU sampling. Its output can be processed with other utilities to get the biggest time consumers.

pzychotic commented 2 months ago

I took a quick look on Windows at the beginning of Retribution Level 15. We spent 66% of CPU time in the graphics driver (Intel integrated graphics), 27% in the Windows kernel and just shy of 5% in our own code.

Screenshot 2024-09-10 204113

Depending on where I look, I get between 75 to 15 FPS. This correlates to about 1000 to 5000 draw calls and scales pretty linear. The scary part is, that on average we only render 2 triangles per draw call. That is complete overkill concerning the overhead each draw call comes with (state changes, etc). Ideally we would want to batch as much geometry as possible with the same state into a single draw call. Which might be a challenge with the current architecture.

Lgt2x commented 2 months ago

very interesting, indeed we need to optimize draw calls. @InsanityBringer any tips for that?

winterheart commented 2 months ago

@pzychotic could you please do same benchmark on 3cb1e8911a1afcc273433db69d843aa51b0203fc revision (before render changes)?

tophyr commented 2 months ago

The scary part is, that on average we only render 2 triangles per draw call.

This, in particular, is unsurprising - the D3 renderer is set up in terms of drawing polygons (usually quads), not objects, so if it were to draw a cube for example it would perform eight g3_DrawPoly calls: One for each side of the cube. We need to transform the renderer so that it thinks primarily about drawing objects, but doing this transformation will require "lifting" the draw operation up to each callsite of g3_DrawPoly - about 65 callsites. Not prohibitive, but not a light job either.

pzychotic commented 2 months ago

could you please do same benchmark on 3cb1e89 revision (before render changes)?

Interesting changes, we spent alot more time in our own code and not much in the Windows kernel while graphics driver was a bit less. Screenshot 2024-09-13 202854

InsanityBringer commented 2 months ago

I don't really have a good solution for the legacy renderer. Terrain is the worst because it adds up to the worst of everything. Expensive objects, expensive rooms, the terrain triangles themselves all in a very open environment not conclusive to culling doesn't help but during my attempts to improve legacy in Piccu I found that actually drawing the terrain itself is probably the smallest cause of lag (though the vastly increased limits of the terrain renderer in 1.5 aren't helping in the slightest)

To some degree, pursuing things like stripification of polygons could lead to some gains, but I feel at that point, you're better off pursuing a meshing solution using newer (even OpenGL 2 era) features like GPU-side vertex buffers.

KynikossDragonn commented 2 months ago

you're better off pursuing a meshing solution using newer (even OpenGL 2 era) features like GPU-side vertex buffers.

I actually agree with this because even the VBO implementation in UA_source really sped up rendering there, and that's already a low poly game.

0xFADDAD commented 2 months ago

Kind of late, and might already be obvious to some, but I forgot the Fusion engine was two engines in one. So I went looking for the game's post-mortem and found an interesting excerpt from Jason Leighton, one of the programmers.

"The terrain engine actually began as a prototype for another game that Jason was interested in developing. Unfortunately, Bungie’s Myth beat us to the idea, but the terrain technology was solid enough to be incorporated into Descent 3. It was based on a great paper by Peter Lindstrom and colleagues entitled Real-Time, Continuous Level of Detail Rendering of Height Fields (from Siggraph 1996 Computer Graphics Proceedings, Addison Wesley, 1996). Of course, it was bastardized heavily to fit the needs of Descent 3, but the overall concept was the same — create more polygonal detail as you get closer to the ground and take away polygons when you are farther away. After implementing the real-time LOD technology, our frame rates quadrupled."

Perhaps the LOD scaling is broken or non-functional after many of the limits had been expanded? Might be worth investigating.

KynikossDragonn commented 2 months ago

Perhaps the LOD scaling is broken or non-functional after many of the limits had been expanded? Might be worth investigating.

Well; the LOD scaling is definitely doing something in release 1.5 but it's still a lot of draw calls...

Try setting the "Terrain Detail" slider all the way to the lowest setting. Though I don't recall off the top of my head how 1.4 behaved.

0xFADDAD commented 2 months ago

Took your advice and tried ticking down the slider, 28 being max, I tried 27 with not much, but some improvement, but 26 seems to be a huge improvement. It's a solution, but the common reasoning would be, "this is a 25- year old game, it 'should' run completely maxed out", but if we're increasing the max polycount beyond what the engine is capable of putting out, it might just be best to leave limits where they were.

KynikossDragonn commented 2 months ago

The path to rendering optimization is probably going to involve gutting the engine down the middle; as stated above:

the D3 renderer is set up in terms of drawing polygons (usually quads), not objects, so if it were to draw a cube for example it would perform eight g3_DrawPoly calls: One for each side of the cube. We need to transform the renderer so that it thinks primarily about drawing objects

Modern OpenGL and Vulkan a lot of stuff is carried on in the GPU rather than the CPU too, so we need to have less CPU bound rendering code. It's not going to be a very easy task I imagine...

I wouldn't have a clue how one would; for example: have the GPU do the procedural textures in hardware versus it happening in the CPU and the code constantly uploading a new texture every frame.