godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.1k stars 21.18k forks source link

Performance drop in 3.3 RC 7 due to change in shadow cubemap size to match resolution #48036

Closed cybereality closed 3 years ago

cybereality commented 3 years ago

Godot version:

v3.3 RC9

OS/device including version:

Windows 10, RTX 2080 Ti, Driver 466.11, GLES3

Issue description:

In several small test projects, the performance has dropped significantly. Typically I would get around 160 fps with V-Sync, now it has dropped to around 60 - 70 fps (and this is with test apps with only a few cubes/planes, not even a full game). I have went back and tested the same projects in v3.2.4 Beta6 and performance is fine. Note, I have a few other Godot projects where performance is still great, so I don't think it's my machine. But something there is causing a performance issue.

Steps to reproduce: I've attached a test project (on Google Drive, too big to upload here). Try opening it in the latest RC, and see if you can reproduce my issue. This is not a demanding scene, just a few simple models.

Minimal reproduction project:

https://drive.google.com/file/d/1hAv2FRAHCVT3p25Upduhm_auqcB5vudT/view?usp=sharing

lawnjelly commented 3 years ago

I get 36fps in beta6 and 21fps in RC9. It's not the 2D (I tried even turning off the fps label, and no change).

Also RC3 gets 36fps, so it is fairly recent. RC4 has 26fps. So the bulk of the change is between RC3 and RC4. RC6 has 26fps.

akien-mga commented 3 years ago

Note that there's a baked lightmap in the MRP, so it needs to be rebaked when switching versions (at least between 3.2.3 and 3.2.4 beta 6+ which added the new CPU lightmapper).

My tests results on Linux with:

System specs ``` $ inxi -CSG System: Host: cauldron Kernel: 5.11.15-desktop-1.mga9 x86_64 bits: 64 Desktop: KDE Plasma 5.21.4 Distro: Mageia 9 mga9 CPU: Info: Quad Core model: Intel Core i7-8705G bits: 64 type: MT MCP cache: L2: 8 MiB Speed: 3101 MHz min/max: 800/3100 MHz Core speeds (MHz): 1: 3101 2: 3100 3: 3101 4: 3100 5: 3100 6: 3100 7: 2674 8: 3100 Graphics: Device-1: Intel HD Graphics 630 driver: i915 v: kernel Device-2: Advanced Micro Devices [AMD/ATI] Polaris 22 XL [Radeon RX Vega M GL] driver: amdgpu v: kernel Device-3: Cheng Uei Precision Industry (Foxlink) HP Wide Vision FHD Camera type: USB driver: uvcvideo Display: x11 server: Mageia X.org 1.20.11 driver: loaded: intel,v4l resolution: 1920x1080~60Hz OpenGL: renderer: Mesa Intel HD Graphics 630 (KBL GT2) v: 4.6 Mesa 21.0.2 ```
GPU 3.2.3-stable 3.2.4 beta 1 3.2.4 RC 3 3.2.4 RC 4 3.3 RC 6 3.3 RC 7 3.3 RC 9
Intel HD 630 47 FPS 47 FPS 46 FPS 45 FPS 45 FPS 38 FPS 38 FPS
Radeon RX Vega M 160 FPS 160 FPS 155 FPS 155 FPS 155 FPS 137 FPS 137 FPS

So I can't reproduce @lawnjelly's performance drop between RC 3 and RC 4. The main difference between the two is that RC 4 spams errors because the MRP is using invalid input mappings. This could cause a FPS drop (in debug builds only) when spamming the stdout (especially on Windows):

ERROR: get_action_strength: Request for nonexistent InputMap action 'look_right'.
   At: main/input_default.cpp:142.
ERROR: get_action_strength: Request for nonexistent InputMap action 'look_left'.
   At: main/input_default.cpp:142.
ERROR: get_action_strength: Request for nonexistent InputMap action 'look_down'.
   At: main/input_default.cpp:142.
ERROR: get_action_strength: Request for nonexistent InputMap action 'look_up'.
   At: main/input_default.cpp:142.

I do reproduce a performance drop between 3.3 RC 6 and RC 7 however, on both Intel and Radeon on Linux.

And I seem to have a tiny performance drop somewhere between 3.2.4 beta 1 and 3.2.4 RC 4, but not sure it's significant enough to be worth looking into.

lawnjelly commented 3 years ago

I've tracked down the major change I think between RC3 and RC4. In RC4 the error message is spamming about using an input map that isn't set, and in RC3 that error message isn't there.

Oo beaten to it!

When I remove those inputs, the fps is the same in RC3 and RC4.

I agree it seems possibly related to the big changes in lightmapping and the project use of lightmap, and if this is happening between RC6 and RC7 this does match in the timeline.

akien-mga commented 3 years ago

@cybereality Could you try to fix the InputMap error locally and see if you still get a performance drop?

If so, can you try intermediate RCs to see if you can pinpoint which one(s) cause a performance drop? From my test it seems to be between RC 6 and RC 7 but maybe it would be different for your hardware.

@lawnjelly Can you reproduce the drop between RC 6 and RC 7?

Relevant changes to look into: https://github.com/godotengine/godot/compare/15ff752737c53a1727cbc011068afa15683509be...cca2637b9b9dcb16070eb50a69c601a5f076c683

Possible "culprit": https://github.com/godotengine/godot/commit/8d156d9e5c3ac1d8988de406b77f82b32bed946e

cybereality commented 3 years ago

Ah, noob mistake. I didn't look at the console window (it was hidden behind the editor). After adding those input mappings, now performance is back to what it was before (60fps to 400fps). So it was definitely the standard out that was slowing it down.

lawnjelly commented 3 years ago

Yes I get from 79fps (RC6) -> 55fps (RC7). I decreased the size of the window a bit so I could see the output, that's why it is faster than the previous figures.

akien-mga commented 3 years ago

Yes I get from 79fps (RC6) -> 55fps (RC7). I decreased the size of the window a bit so I could see the output, that's why it is faster than the previous figures.

...

Possible "culprit": 8d156d9

Confirmed that this commit explains the slight performance drop experienced on my hardware.

To my untrained eyes the shadow quality in 3.3 RC 7 might be slightly better indeed, but I'm not sure it's significant enough to warrant having default settings which lead to a performance reduction. CC @clayjohn @puchik

If I read this correctly with this commit we're going from a cubemap sampler size of 512 to 4096, i.e. 8 times bigger (or is that 8²?). Maybe we should reduce the default values?

However if I set rendering/quality/shadow_atlas/size to 512 then I do get awful looking shadows. So maybe the cubemap sample size should get its own config value that could default to e.g. 512 instead of reusing the value of rendering/quality/shadow_atlas/size?

akien-mga commented 3 years ago

I retitled the issue to track the new issue that was identified with this MRP (regression from #47160 in 3.3 RC 7), since the other one was just the consequence of error spam.

akien-mga commented 3 years ago

For comparison:

3.3 RC 6, default settings (shadow atlas size 4096, cubemap sampler size 512): Screenshot_20210420_094452

3.3 RC 7, default settings (shadow atlas size 4096, cubemap sampler size 4096): Screenshot_20210420_094622

3.3 RC 7, shadow atlas size and cubemap sampler size 2048: Screenshot_20210420_161820

3.3 RC 7, shadow atlas size and cubemap sampler size 1024: Screenshot_20210420_162008

3.3 RC 7, shadow atlas size and cubemap smapler size 512: Screenshot_20210420_094721

So lowering the common setting to get back the 3.3 RC 6 performance is definitely not an option.

Zireael07 commented 3 years ago

How does 1024 look?

clayjohn commented 3 years ago

I'm a little surprised that #47160 is causing such a performance regression. In my testing the difference was about 0.1 to 0.5 ms when using 32 dynamic lights with shadows.

That being said, we should consider reducing the default shadow atlas size to 2048 now. The quality is better than 4096 without #47160 and speed should be improved.

See also for further discussion. https://github.com/puchik/godot/pull/1

akien-mga commented 3 years ago

I updated https://github.com/godotengine/godot/issues/48036#issuecomment-823058246 to include screenshots using 3.3 RC 7 with shadow atlas size 2048 and 1024.

The quality on this MRP does suffer quite a few from reducing to 2048 if you compare to 3.3 RC 6:

RC6: Shadow atlas 4096, cubemap sampler 512: Screenshot_20210420_162207

RC7: Shadow atlas and cubemap sampler 2048: Screenshot_20210420_162223

clayjohn commented 3 years ago

@akien-mga thanks. That does indeed look like an unacceptable tradeoff.

I don't want want to revert #47160 entirely as it makes it possible to have much better shadow quality (at the expense of speed) than was possible in 3.2.

I will make a PR tonight that introduces a project setting for max_shadow_cubemap_sampler_size: https://github.com/godotengine/godot/blob/2ece8c44b8d26098570b8bf1b7727fad1e58159d/drivers/gles3/rasterizer_scene_gles3.cpp#L5113

We can default to 512 so that the quality/performance is the same in 3.3 as it was in 3.2 and then users who want more quality can increase it as they need.

Calinou commented 3 years ago

That being said, we should consider reducing the default shadow atlas size to 2048 now. The quality is better than 4096 without #47160 and speed should be improved.

Sharper shadows aren't always better. Using a lower resolution introduces blurriness, but shadows in real life aren't perfectly sharp, especially when the shadow is far away from its caster.

In fact, I tend to find the default OmniLight/SpotLight shadows too sharp right now (when used at the typical OmniLight/SpotLight sizes). This is especially the case when you compare them to the default DirectionalLight shadow quality, which makes them look inconsistent.

Anyway, I think @clayjohn's solution is the way to go for 3.3.

Cykyrios commented 3 years ago

I can confirm as well that #47160 caused a significant framerate drop on my project, which made it to go from roughly 50-ish fps (4 instances as I test multiplayer) to literally single digit fps (on a GTX 1060). I have however not touched any shadow settings, so if decreasing the default size results in roughly the same performance and quality as 3.2, this is probably fine.

Below is a quick comparison between the latest 3.2 commit vs 8d156d9, no project settings changed:

https://user-images.githubusercontent.com/53737317/115436292-1b391d80-a20b-11eb-9b90-bdeb770550e1.mp4

https://user-images.githubusercontent.com/53737317/115436422-4459ae00-a20b-11eb-94d1-9fe4c1b676fd.mp4

akien-mga commented 3 years ago

That's a very massive performance hit, that sounds surprising compared to what we were discussing above. Could you test 3.3 RC 6 and RC 7 to confirm that you get good performance in RC 6 and the drop in RC 7?

Cykyrios commented 3 years ago

I just tried with RC6 and RC7, I can confirm I get the same performance drop on RC7 only. For what it's worth, I notice some really high CPU usage on a single thread with RC7 (I do have quite a few Tweens at work), which doesn't happen with RC6. My system specs below just in case:

OS: Linux Mint 20.1 (kernel is 5.4.0) CPU: Intel i7-8700K GPU: Nvidia GTX 1060 6GB RAM: 16 GB, most likely irrelevant here

Below is a screenshot of the framerate monitor in RC7 (the two dips to ~15 fps correspond to the animations). It doesn't seem to go down quite as low as what the above video shows, probably because of averaged values. In RC6, framerate drops to 30-40 during the animations. 3.3 RC7 FPS Monitor

My issue may be a somewhat extreme case where high CPU usage causes framerate to drop more than it should, but RC7 definitely did impact framerate.

puchik commented 3 years ago

We can default to 512 so that the quality/performance is the same in 3.3 as it was in 3.2 and then users who want more quality can increase it as they need.

This is the way to go. It's possible the large frame drops are a combination of things but the shadow resolution would definitely contribute to it. For my project having the cubemap capped to 512 looked terrible (indoor environment with low lighting) and having it increased caused minimal drops.

clayjohn commented 3 years ago

Funny enough, running on my hardware I can't reproduce a performance drop between RC 6 and RC 7. In both cases performance is highly variable for the first 10 seconds or so and then plateaus at about 125 FPS. Note, I am not re-baking the lightmap between runs which shouldn't matter if 8d156d9 is indeed the culprit.

On desktop Windows 10 AMD RX 5600 XT _RC6_ ![Screenshot (139)](https://user-images.githubusercontent.com/16521339/115497981-8857ad80-a221-11eb-94c2-50fbf4b6a8ca.png) _RC7_ ![Screenshot (140)](https://user-images.githubusercontent.com/16521339/115497996-8ee62500-a221-11eb-95f7-1823af60f87c.png)

I am going to test on my laptop using integrated graphics to see if I can reproduce there.

On laptop (intel integrated) Windows 10 UHD Graphics 620 _RC6_ ![Screenshot (1)](https://user-images.githubusercontent.com/16521339/115500913-15513580-a227-11eb-9530-bcfe3eb40724.png) _RC7_ ![Screenshot (10)](https://user-images.githubusercontent.com/16521339/115500919-184c2600-a227-11eb-8e97-aa12d02988b1.png)
I am seeing a performance drop on my laptop with intel integrated graphics. PR incoming with the changes described above.
akien-mga commented 3 years ago

Fixed by #48059.

@Cykyrios I'm not sure yet if your performance regression was exactly the same as the one fixed by #48059, or if it's that one + something else that also happened in RC 7. I'll make you a Linux build quickly to test.

lawnjelly commented 3 years ago

It could be that the shadow map resolution problem is affecting @Cykyrios much more, because he is using split screen. 4 screens would seem to be 4x as many shadow map renders, for the same final screen area. That said there still could be other changes in RC7.

akien-mga commented 3 years ago

@Cykyrios I'm not sure yet if your performance regression was exactly the same as the one fixed by #48059, or if it's that one + something else that also happened in RC 7. I'll make you a Linux build quickly to test.

Could you test this build? It contains #48059 so if you use default shadow map settings, it should behave the same as 3.2.3 / 3.3 RC 6.

https://downloads.tuxfamily.org/godotengine/testing/Godot_v3.3-rc.d91b780_x11.64.zip

Cykyrios commented 3 years ago

@akien-mga This build does have the same performance as RC6 for me, and I also compiled from the latest commit just in case, same result. Thanks a lot to you and @clayjohn!

Edit: Actually I just increased the cubemap size from the default 512 to 2048 (shadow atlas size was 8192 already) and I still get the same performance as RC6.