godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.14k stars 21.19k forks source link

Performance is worse with Batching ON #46122

Closed thes3m closed 1 year ago

thes3m commented 3 years ago

Godot version: 3.2.4 RC2 (also tested on 3.2.3 GLES2)

OS/device including version: OS: Windows 10 GPU: AMD Radeon R7 GLES3, but also tested on GLES2

Issue description: We have a UI heavy game where I noticed that when drawing large amount of UI objects on screen the performance can get worse if Project Settings-> Rendering -> Batching-> Use batching is ON. This does not seem right because batching should improve performance and not degrade it. This is probably some sort of issue that causes more CPU or Memory consumption during batching so much that it starts to affect frame rate. I cannot provide a sample project but I have monitored performance in same project with different settings.

batching off: draw-calls-no-batching

batching on: draw-calls-batching

You can see that without batching performance is steady at 60FPS (considering the middle part of the chart) with almost 800 draw calls, and with batching on the draw calls drop significantly to 142, which is good, but the actual framerate it around 45FPS. That does not seem right. We have expirienced this on different GPUs so it is not hardware related.

Calinou commented 3 years ago

@thes3m Please upload a minimal reproduction project to make this easier to troubleshoot. There's nothing we can do about this otherwise.

thes3m commented 3 years ago

@Calinou I cannot share my main project. however I've tried to create a simple project but it does not behave the same way. If I manage to create a small reproduction project I will share it.

lawnjelly commented 3 years ago

Also useful would be to let us know what you are drawing (rects? lines? polys? anti aliased? normal mapping? some combination?), and / or select rendering->batching->debug->diagnose_frame, and post the log in here. Although I'm unlikely to be able to solve it without a min reproduction project.

For pinning it down to a minimum reproduction project, it's highly likely to be a non-rect primitive, or a custom _draw that you are using, simply because it has now had quite a bit of testing in the more regular usage. Just hide branches of your scene tree, rerun it, and try and find if there is a point where performance rapidly drops off, if so, that will likely be where the problem lies.

thes3m commented 3 years ago

@lawnjelly The game is basically created only with Control Nodes and their subclasses. There are no custom _draw methods, I am only using controls from Godot. So basically the main elements that draw things that are TextureRects, Panel (Ninepatch StyleBox) and Labels. We use custom shaders on labels, but that is almost everything. (Disabling our shaders does not affect performance)

jamie-pate commented 3 years ago

I'm seeing similar issues with a mostly 3d game. I thought it was a regression with 3.3 vs 3.2 but disabling the batching returns performance to the levels seen in 3.2

Platform: Windows, Intel 620/630

Callstack timings for 3.3 on the left and 3.2 on the right. image

Benchmarks (~2 minutes of simulate gameplay)

Slowness measured as percentage difference in framerate (vs the highest sample for each GPU) image

Raw frame rates from benchmark image

More detail from the profiler image

3.3 batching: image whole draw() call 3.3 image 3.2 image

batching/debug/diagnose_frame=true batchdiag.txt

lawnjelly commented 3 years ago

I'll need a minimum reproduction project to investigate.

The CPU use (measured in a profiler) is expected to be higher with batching than without - that is what it does, trades more CPU use to batch together items in return for less draw calls, which speeds up the transfer / GPU side.

From the diagnostic, it looks as though it is batching stuff reasonably ok, so if you are getting a drop in frame rate that is curious, maybe there is something pathological happening.

Comparisons of different versions doesn't tell us all that much unfortunately, because there have been thousands of changes in Godot between versions. The only really useful comparison is frame rates / times with batching on and off in the same build. With your graphs I think there may be some labels missing, you have several readings for a version but it is not clear which is which, I'm just guessing one of the slower ones is the batching...

From your profile it is also possible you are getting a stall from the driver. flush_render_batches, apart from render_batches which is called within, doesn't actually do that much, it translates FVFs (which should be pretty cheap) and it uploads the data to OpenGL. You could possibly be getting a stall during the uploads, if it doesn't like the vertex buffer use. There are some settings in rendering/2d/opengl which can be used to alter this behaviour, this might help you pin down whether this is a problem.

jamie-pate commented 3 years ago

In my application the cpu is the bottleneck by the time you are at this detail level. I've disabled almost everything including removing all lights. The Intel drivers are very cpu heavy too. The test hardware has a pretty weak laptop cpu in it.

Do the Godot docs have a 'tuning' section that has a list of 'benchmark these' settings that will depend on your target platform and workload?

lawnjelly commented 3 years ago

Well in practice, the CPU use is usually pretty small from batching, unless you are really caning things with e.g. 100,000 items or so. Much of the time in your profile is not spent in the batching code itself, it is in the graphics driver, likely from the uploads.

What can happen is that on certain hardware / drivers, they don't cope well with orphaning vertex buffers, and buffer reuse, or specifically certain types of use. This can vary from platform to platform, and there don't appear to be any rules to this, OpenGL spec is quite vague in this regard and we have had many problems in the past trying to get it to work well with all hardware / drivers. We have had particular problems on Macs. This is usually down to drivers.

This kind of thing is why it is crucial to fill out an issue template, with the platform details - the OS, GPU, drivers etc, and a minimum reproduction project.

We attempt to get things working as well as possible on the widest selection of hardware, but it is possible there will be problems on particular systems and driver versions, we investigate these as they crop up.

jamie-pate commented 3 years ago

Comparisons of different versions doesn't tell us all that much unfortunately, because there have been thousands of changes in Godot between versions. The only really useful comparison is frame rates / times with batching on and off in the same build

I did a bunch more benchmarks with batching disabled and the frame time is now identical for both versions. You could label them with/without batching enabled and get the same graph.

The profiling is using both versions for the same reason. I was initially trying to diagnose the apparent regression before we commit to updating to 3.3 in our next release. It took hours to line up the two profiles well enough that i could eyeball the discrepancy.

I don't think there's a way to enable/disable batching from the command line without a custom script is there? My benchmark bash script is specifically written to find any performance regression on these low end platforms since that is a large section of our player base. I needs a command line flag to vary the parameters. I'l have to add a command line flag i guess to enable/disable the setting in gdscript to make it work.

From your profile it is also possible you are getting a stall from the driver.

The Windows Intel drivers are constantly stalling :)

lawnjelly commented 3 years ago

If your game is primarily 3D, and only contains a few 2D elements on screen, you might not lose out too much by having batching off. Just bear in mind that without batching, every item is going to be a drawcall.

I'l have to add a command line flag i guess to enable/disable the setting in gdscript to make it work.

This is something that could be added. There hasn't been much call for it so far, as most people either commit to batching or not. It's not easy to turn on / off during runtime, as it reserves a bunch of resources etc, so you need to decide at startup somehow, whether through project setting or a command line option.

I did a bunch more benchmarks with batching disabled and the frame time is now identical for both versions. You could label them with/without batching enabled and get the same graph.

Yes, I can see that it is quite difficult to pinpoint what might have changed in a new version to affect your performance. There may be some changes that are positive, some negative. Some bug fixes for rigour can have negative effects on performance. Testing betas as they come out is a good way to keep on top of any negative performance issues, although I understand that can be hard to do as it takes time.

This reminds me, something I've been meaning to suggest to the team is that we have a repository / forum or something specifically for people to get help with performance issues. The bug tracker isn't really suited for this, but there's often pretty simple things users can do to vastly increase performance. This is particularly useful in cases like yours when you have invested a lot of time making something, rather than just a game jam game.

jamie-pate commented 3 years ago

Just bear in mind that without batching, every item is going to be a drawcall

Yes, this is the part that still confuses me. I guess the Intel drivers just prefer the brute force approach and sending buffers is somehow worse than draw calls?

If I could find time to create a minimal test case I'd send a bug report to the Intel driver team :D

I should also run the benchmark on the Linux driver because that driver seems to run a lot smoother...

lawnjelly commented 3 years ago

Yes, this is the part that still confuses me. I guess the Intel drivers just prefer the brute force approach and sending buffers is somehow worse than draw calls?

In batching (and some legacy primitives) we use a technique called buffer orphaning. This involves reusing the same vertex buffer many times per frame (perhaps as much as 1000 times). If the driver has been written in a way to be able to deal with this situation, it works great. If not, things can go horribly wrong.

What makes it worse is that there are various flags / methods that can be used for such dynamic buffers, and OpenGL specification is not clear about which should be used in any situation. This makes life difficult for both engine developers, and driver developers. You can easily end up with the situation like ANGLE, where translating OpenGL to DirectX results in better performance than using OpenGL natively, simply because the specification is more clearly defined in DirectX.

This vagueness is partly the reason for the move to Vulkan. But making the driver more low level, and leaving more in the hands of the developer, it hopefully leaves less opportunities for misunderstandings.

There are some things that could potentially be done to decrease the number of buffer uploads, to work around these problematic drivers, but it is a matter of finding time and the will. We should probably liaise more with the GPU manufacturers to sort some of these problems. I know we have been having some success with ARM recently for the vulkan work.

jamie-pate commented 3 years ago

FWIW the linux driver behaves (better anyways) :D

The same I7-8650U (uhd 620) on linux 5.8.0:

fps ON 119.900915 ON 121.396055 ON 120.989332 OFF 122.096272 OFF 120.357799 OFF 122.275449

The "U" means the chip is designed for laptops and mobile devices, as "U" chips are Intel's "ultra-low power" models. They're "low power" because they use even less power than the "T" models and have slower clock speeds than their full-size, non "U" equivalents.

Calinou commented 1 year ago

@thes3m Can you (or anyone else) still reproduce this bug in Godot 3.5.1 or any later release?

If yes, please ensure that an up-to-date Minimal Reproduction Project (MRP) is included in this report (a MRP is a zipped Godot project with the minimal elements necessary to reliably trigger the bug). You can upload ZIP files in an issue comment with a drag and drop.

Calinou commented 1 year ago

Closing due to lack of response. Please comment if you can still reproduce this bug on the latest Godot version.

PS: Since no minimal reproduction project was included in the original bug report, please upload one as well to ease troubleshooting.