godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
90.75k stars 21.12k forks source link

`RD::texture_create_shared_from_slice` become very slow when used extensively on a `Texture2DArrayRD` #98733

Open ze2j opened 4 hours ago

ze2j commented 4 hours ago

Tested versions

System information

Ubuntu 22.04.5 LTS 22.04 - X11 - Vulkan (Forward+) - dedicated NVIDIA GeForce GTX 1070 (nvidia; 535.183.01) - Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz (8 Threads)

Issue description

I noticed a performance issue in RenderingDevice::texture_create_shared_from_slice. In my game I use a Texture2DArrayRD and create 9 shared textures for each layer (the mipmaps). When I end up with 2000 shared textures, my game starts jittering a lot.

I profiled my game using Tracy and the issue is with this line of code: Texture texture = *src_texture; in RenderingDevice::texture_create_shared_from_slice. 80% of the duration of RenderingDevice::texture_create_shared_from_slice can be spent there.

The Texture class has a member named slice_trackers which is a hashmap tracking the shared textures (correct me if I am wrong). It's the copy of this hashmap which slows down the Texture copy. On my system the copy takes about 50ns when the hashmap is almost empty and can reach 200µs or more when it contains 2500+ elements:

slice_trackers copy As I can create 100 shared textures per frame, I get frames at 40ms or more...

IMO this can be solved by not copying the slice_trackers member. slice_trackers is only used by the owner texture and not by the shared ones. In the current implementation we have Texture texture = *src_texture; and then texture->slice_trackers.clear(); comes later in RD::_texture_make_mutable.

I am considering creating a PR replacing the line Texture texture = *src_texture; by something like Texture texture = src_texture->duplicate_as_shared_texture(); where Texture duplicate_as_shared_texture() const copies every members except slice_trackers.

I tested this fix and I got the expected results (this the durations of RenderingDevice::texture_create_shared_from_slice): slice_trackers no copy

I am worried about maintainability and if you would like to support such a use case in the first place. So before creating a MRP (which require some work) and a PR I would like to get some feedback first.

Steps to reproduce

Minimal reproduction project (MRP)

Will do if my use case is not too project specific.

clayjohn commented 2 hours ago

Performance improvements are almost always welcome. I would have to see a PR to properly judge if the additional complexity/maintenance burden is justifiable though.