godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.06k stars 21.18k forks source link

CanvasItem clip children is extremely slow #79439

Open djrain opened 1 year ago

djrain commented 1 year ago

Godot version

4.1 stable

System information

macOS, Android, iOS - all renderers

Issue description

We're making a mobile game, and we were getting poor performance on Android. After some testing we found that having even a small number of nodes with clipping enabled was the culprit, making this feature borderline unusable. Turning clipping off brought the game back up to acceptable FPS.

The slowness happens in all renderers and not just on mobile. Having more children being clipped does not seem to matter much - only the total number of nodes with clipping enabled.

I would normally assume that clipping might have a significant performance impact, but this blog post stated that clipping happens "at literally no cost", which sounded too good to be true, but based on that this result is very unexpected.

If a notable performance hit is indeed expected when using this, it would be nice to have documented.

Steps to reproduce

Run MRP main scene. Press enter to toggle clipping on and off, and observe huge FPS change. (note vsync is off) On my M1 Mac it drops from around 1300 fps to about 75.

Minimal reproduction project

ClipChildrenSlow.zip

Calinou commented 1 year ago

I can confirm this on 4.1.stable (Linux, GeForce RTX 4090 with NVIDIA 535.54.03).

1152×548

Clipping disabled Clipping enabled
11471 FPS (0.09 mspf) 1902 FPS (0.53 mspf)

3840×2160 (disabled stretch mode)

Clipping disabled Clipping enabled
5711 FPS (0.18 mspf) 1478 FPS (0.68 mspf)

3840×2160 (canvas_items stretch mode, makes clipped sprites larger)

Clipping disabled Clipping enabled
4114 FPS (0.24 mspf) 844 FPS (1.18 mspf)

What's interesting is that while GPU usage goes up with clipping enabled, power consumption goes down (despite both cases running at the same GPU core and memory clocks):

Clipping disabled Clipping enabled
Screenshot_20230714_000037 Screenshot_20230714_000050
djrain commented 1 year ago

It's quite a bummer to have such a useful feature hindered by poor performance :( @Calinou can we tag someone who might have some ideas?

Alexfox22 commented 1 year ago

@Calinou hi! Are there any updates on this topic? Will it be fixed?

clayjohn commented 1 year ago

This highlights an important difference between mobile and desktop GPUs. Due to a difference in architecture mobile GPUs pay a much larger performance penalty for every pixel touched and for switching render targets.

CanvasItem clipping requires switching render targets twice (once to back buffer, then back to front buffer) and the way it is implemented now requires a full screen copy (it touches a lot of pixels).

On desktop this is essentially free as the cost of switching render targets and copying pixels is super low.

Ultimately the render target switching can't be reduced so clipping will always be somewhat expensive on mobile.

Right now we always copy the full front buffer when doing the clipping, but we really only need to do that when mipmaps are enabled. We can use a tougher clipping rect to reduce the cost of copying the pixels, but I doubt even that optimization will be enough to make this efficient on mobile devices.

djrain commented 1 year ago

On desktop this is essentially free

But it doesn't seem to be the case, both Calinou and I observed a significant FPS drop on desktop?

PickleJesus123 commented 1 year ago

My performance issues were coming specifically from CanvasGroup, not CanvasItem.

Perhaps there should be a 'warning label' in the Godot documentation about using CanvasGroups in mobile projects?

As a new transplant from Unity, I started 'intuitively' using CanvasGroups all throughout my project. It wasn't until significantly later that I realized it was killing my app's performance. I worry that a lot of other newbies will do the same thing, and that may lead to doubts about Godot's capabilities.

clayjohn commented 1 year ago

Thinking more about this, I can think of two things to explore to improve performance:

  1. As mentioned above, we need to ensure we use the smallest clipping rect possible to reduce the number of pixels being copied to the backbuffer
  2. We might be able to do some smart caching of CanvasGroups and cache the results within the CanvasGroup if none of the child nodes is changed. Right now it is totally dynamic, which saves on memory, but ends up doing the same calculations every frame

In the short term we may indeed want a clear warning in the documentation as the current design is very bad for mobile and that won't change without drastic intervention

saletrak commented 1 year ago

@clayjohn Hi, tell me if I'm wrong, but what do you think about making a clipping inside SubViewport, to generate clipped image only once and display it as a texture?

clayjohn commented 1 year ago

@clayjohn Hi, tell me if I'm wrong, but what do you think about making a clipping inside SubViewport, to generate clipped image only once and display it as a texture?

That's definitely an optimization you can do today. It is slightly more cumbersome than using clip_children directly. But it allows you to cache the results of clip_children which will be a net win for performance. In cases where you don't need the clip_children node and child nodes changing every frame, it is definitely best to cache the results in a SubViewport and apply the SubViewport's texture directly.

insomniacUNDERSCORElemon commented 4 months ago

Could this be done by only updating when needed? For instance, only updating:

? Assuming there isn't something blocking this that I'm missing, it would make static art the default (and updating based on what is added) solving proposal 8747.

Also mentioned in the PR (cascaded) above:

  1. instancing optimization would be very useful particularly for tilemaps and spawning, which need it due to rendering the same scene multiple times over (perhaps animations making that a bit more tricky).

  2. MSAA really gives a massive limitation on clipping instances w/the PR (only rendering 22-25% of the instances compared to 4.2.2's implementation, at least for me w/a 1050Ti where MSAA is a major limit there) though I have not tested if downscaling could be a better alternative to get AA.


I would say that perhaps clipping could be skipped in certain scenarios, though that would probably be mostly used for eyes (when the iris is not near the edge) and maybe a few other character/dynamic things and probably not much else (most interesting art done with this will always need clipping). Though in cases where that is viable, it could be turned off by animating/scripting the value.

In a similar vein, could partial/incremental updates be a thing, particularly for less complex setups/untextured polygons?