Wumpf / blub

3D fluid simulation experiments in Rust, using WebGPU-rs (WIP)
MIT License
403 stars 12 forks source link

Look into swapping some volume textures with 2d texture arrays for optimization #35

Open Wumpf opened 4 years ago

Wumpf commented 4 years ago

Various sources, old and new, claim that on Nvidia hardware 3D textures are actually 2d slices! http://www-ppl.ist.osaka-u.ac.jp/research/papers/201405_sugimoto_pc.pdf https://www.sciencedirect.com/science/article/pii/S2468502X1730027X#fig1 https://forum.unity.com/threads/improving-performance-of-3d-textures-using-texture-arrays.725384/#post-4849571 For Intel this is directly documented https://www.x.org/docs/intel/BYT/intel_os_gfx_prm_vol5_-_memory_views.pdf Wasn't able to find anything on AMD, but there is sources implying the layered nature "Two bilinear fetches are required when sampling from a volume texture with bilinear filtering"

It stands to reason that making the layering explicit by using 2d texture arrays there should be some room for optimization!

Wumpf commented 4 years ago

Switching to texture 2d arrays in the solver did as expected not influence perf much. Group sizes using the "layering suspicion" has been beneficial though. Interestingly so far swapping around to z up made perf worse, don't understand why. Went back to 3d textures since it is still the more correct intention and description for the sampling pattern (I am most of the time sampling all 3d neighbors!). Different story if I'd start using texturegather 🤔 Feels like more investigation can be done here, keeping ticket open

Wumpf commented 3 years ago

Recentish again confirming that Nvidia 3D textures are 2D sliced in memory https://twitter.com/adamjmiles/status/1118884832455659521 Worth noting that Seb points out that RenderTarget setting definitely broke it for him - which makes sense because when doing render targets one renders in slices. -> Stands to question that making my textures write targets (all are) may break tiling. (who knows maybe it always does?)

Maybe should consider going all buffer on everything - just identify where I actually do filtering fetches