ConfettiFX / The-Forge

The Forge Cross-Platform Rendering Framework PC Windows, Steamdeck (native), Ray Tracing, macOS / iOS, Android, XBOX, PS4, PS5, Switch, Quest 2
Apache License 2.0
4.65k stars 491 forks source link

one descriptorPool per DescriptorSet cause high cpu and gpu memory consumption in version 1.51 #242

Closed yaoyao-cn closed 2 years ago

yaoyao-cn commented 2 years ago

after i update to v1.51, the memory consumption of cpu and gpu become incredible high, and vkCreateDescriptorPool returns VK_ERROR_OUT_OF_DEVICE_MEMORY

and i found the difference here: https://github.com/ConfettiFX/The-Forge/blob/da36bb40f6ea11df76167f528032c52999912d82/Common_3/Renderer/Vulkan/Vulkan.cpp?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L4892

am i missing somthing?

manas-kulkarni commented 2 years ago

How many DescriptorSet structs do you have?

yaoyao-cn commented 2 years ago

about 20k DescriptorSet used for per draw word matrix and texture params, i use the-forge for a CAD program, 20k drawcalls is very common. when use version 1.5 there seems no problem. but now the 8G gpu shared memory can be used up and vkCreateDescriptorPool return VK_ERROR_OUT_OF_DEVICE_MEMORY highmem hw

i know i can use a storge buffer for my matrix and use dynamic descripor set, but what about the texture param if bindless is not support on target device

thank you(^_^)

manas-kulkarni commented 2 years ago

If all the 20k descriptor sets are using the same root signature, you can allocate just one DescriptorSet struct with mMaxSets=20k. We will take a look on our side as well if this doesn't resolve your issue

yaoyao-cn commented 2 years ago

@manas-kulkarni it works !

thank you very much! your idea makes gpu memory usage even lower than version 1.5, now it only takes about 100M shared gpu memory. i think i shoud matain a descriptor set list created with large mMaxSets per root signature and create new descriptor set on demand.

and here: https://github.com/ConfettiFX/The-Forge/blob/master/Common_3/Renderer/Vulkan/Vulkan.cpp#L4877 i use tf_malloc instead alloca in case of stack overflow when mMaxSets is too big

thank you again! and merry christmas !

manas-kulkarni commented 2 years ago

Glad that it worked. We will fix that alloca call in next release. Happy Holidays!