Closed Try closed 2 years ago
Meshlet packing:
Note: non-landscape objects are black
Main-pass 1.81 ms Shadows 0.32 + 0.71 Drawcall count decreased by ~700 calls.
With meshshader execution barrier becomes an issue: shadow
-> main pass
barrier takes 1ms
For now works slightly slower than CPU culling,
Note:
It seems gl_PrimitiveIndicesNV
can be limiting factor in a way:
In spec gl_PrimitiveIndicesNV[]
must have size that matches exactly max_primitives*triangles
, and unfortunately never more.
For mesh block-compression scheme it world be way better to have out-of-bounds access on this array(or option to redeclarate it with bigger size)
With pack-algorithm fixup: Main-pass 1.36 ms Shadows 0.34 + 0.67
Some progress on task shader:
Main pass actually became slower, but good part is draw-call count: now whole frame takes 85 draw-calls instead of 2.3k.
Is this a optimisation only for newer hardware like Nvidia Turing and Ampere and AMD RDNA2?
@Coockie this iteration - yes only for latest hardware. Once culling algorithm is more-less complete and numbers are good we will expand to older hardware.
The way I'm planning to backport mesh shaders to Vulkan1.0 is based on this idea: https://tellusim.com/mesh-shader-emulation/
Basically set of compute-shaders + draw-indirect + software binner
Who needs task shading? Just draw more meshlets:
So far best iteration:
Add meshlets for skinned meshes:
Not great, not worse, timings, except for draw-call count. Basically looking forward to HiZ support
Initial HiZ: only 32x32 tile size generated, with no mip-map chain for now Main-pass: 1.1
Progress in HiZ. Overdraw:
Without:
With:
Small waley behind the farm(center) and stuff occluded by mountain(left) was removed
Still there is performance issue: reading HiZ with textureGather(mip=0)/texelFetch*4 is not performant.
Wrapping up for now: System-trace profiler shows very odd numbers: rendering is faster, but barriers are terribly large (~1ms see above). Frame profiler shows else picture. According to frame-profiler meshlets 12% faster.
In the end of the day: base-fps = 170, mesh fps = 204. So meshlets are good.
Meshlets: Drawcalls:
Meshlets | Drawcalls | |
---|---|---|
Fps | 204 | 170 |
Triangles | 102316 | 3184884 |
Drawcalls | 19 | 4766 |
Sys-trace: (Note: no idea what nsight displays as draw-call count. it's definitely not 19. Probably it's pipeline switches)
Overdraw visualization: Baseline: Meshlets:
Goal of this ticket to prototype, implement and test GPU-driven rendering.
GPU driven allowed for better quality of invisible geometry culling, and can potentially reduce CPU load.
Current idea:
Implementation detail (so far):
Baseline:
Main-pass 1.51ms Shadows 0.35 + 0.85 RayTracing pass 2.27 ms