I noticed a performance issue in RenderingDevice::texture_create_shared_from_slice. In my game I use a Texture2DArrayRD and create 9 shared textures for each layer (the mipmaps). When I end up with 2000 shared textures, my game starts jittering a lot.
I profiled my game using Tracy and the issue is with this line of code: Texture texture = *src_texture; in RenderingDevice::texture_create_shared_from_slice. 80% of the duration of RenderingDevice::texture_create_shared_from_slice can be spent there.
The Texture class has a member named slice_trackers which is a hashmap tracking the shared textures (correct me if I am wrong). It's the copy of this hashmap which slows down the Texture copy. On my system the copy takes about 50ns when the hashmap is almost empty and can reach 200µs or more when it contains 2500+ elements:
As I can create 100 shared textures per frame, I get frames at 40ms or more...
IMO this can be solved by not copying the slice_trackers member. slice_trackers is only used by the owner texture and not by the shared ones. In the current implementation we have Texture texture = *src_texture; and then texture->slice_trackers.clear(); comes later in RD::_texture_make_mutable.
I am considering creating a PR replacing the line Texture texture = *src_texture; by something like Texture texture = src_texture->duplicate_as_shared_texture(); where Texture duplicate_as_shared_texture() const copies every members except slice_trackers.
I tested this fix and I got the expected results (this the durations of RenderingDevice::texture_create_shared_from_slice):
I am worried about maintainability and if you would like to support such a use case in the first place. So before creating a MRP (which require some work) and a PR I would like to get some feedback first.
Steps to reproduce
Create a Texture2DArrayRD with 1024 layers
Call 10 times RD::texture_create_shared_from_slice for each layer
Minimal reproduction project (MRP)
Will do if my use case is not too project specific.
Performance improvements are almost always welcome. I would have to see a PR to properly judge if the additional complexity/maintenance burden is justifiable though.
Tested versions
System information
Ubuntu 22.04.5 LTS 22.04 - X11 - Vulkan (Forward+) - dedicated NVIDIA GeForce GTX 1070 (nvidia; 535.183.01) - Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz (8 Threads)
Issue description
I noticed a performance issue in
RenderingDevice::texture_create_shared_from_slice
. In my game I use aTexture2DArrayRD
and create 9 shared textures for each layer (the mipmaps). When I end up with 2000 shared textures, my game starts jittering a lot.I profiled my game using Tracy and the issue is with this line of code:
Texture texture = *src_texture;
inRenderingDevice::texture_create_shared_from_slice
. 80% of the duration ofRenderingDevice::texture_create_shared_from_slice
can be spent there.The
Texture
class has a member namedslice_trackers
which is a hashmap tracking the shared textures (correct me if I am wrong). It's the copy of this hashmap which slows down theTexture
copy. On my system the copy takes about 50ns when the hashmap is almost empty and can reach 200µs or more when it contains 2500+ elements:As I can create 100 shared textures per frame, I get frames at 40ms or more...
IMO this can be solved by not copying the
slice_trackers
member.slice_trackers
is only used by the owner texture and not by the shared ones. In the current implementation we haveTexture texture = *src_texture;
and thentexture->slice_trackers.clear();
comes later inRD::_texture_make_mutable
.I am considering creating a PR replacing the line
Texture texture = *src_texture;
by something likeTexture texture = src_texture->duplicate_as_shared_texture();
whereTexture duplicate_as_shared_texture() const
copies every members exceptslice_trackers
.I tested this fix and I got the expected results (this the durations of
RenderingDevice::texture_create_shared_from_slice
):I am worried about maintainability and if you would like to support such a use case in the first place. So before creating a MRP (which require some work) and a PR I would like to get some feedback first.
Steps to reproduce
Texture2DArrayRD
with 1024 layersRD::texture_create_shared_from_slice
for each layerMinimal reproduction project (MRP)
Will do if my use case is not too project specific.