Try / OpenGothic

Reimplementation of Gothic 2 Notr
MIT License
1.15k stars 82 forks source link

GPU driven rendering #266

Closed Try closed 2 years ago

Try commented 2 years ago

Goal of this ticket to prototype, implement and test GPU-driven rendering.

GPU driven allowed for better quality of invisible geometry culling, and can potentially reduce CPU load.

Current idea:

  1. Take task/mesh shader as baseline
  2. Test on native hardware
  3. Implement backport for Vulkan1.0 hardware, if idea works

Implementation detail (so far):

Baseline: изображение

Main-pass 1.51ms Shadows 0.35 + 0.85 RayTracing pass 2.27 ms

Try commented 2 years ago

Meshlet packing: изображение

Note: non-landscape objects are black

Try commented 2 years ago

изображение

Main-pass 1.81 ms Shadows 0.32 + 0.71 Drawcall count decreased by ~700 calls.

With meshshader execution barrier becomes an issue: shadow -> main pass barrier takes 1ms For now works slightly slower than CPU culling,

Try commented 2 years ago

Note: It seems gl_PrimitiveIndicesNV can be limiting factor in a way:

In spec gl_PrimitiveIndicesNV[] must have size that matches exactly max_primitives*triangles, and unfortunately never more.

For mesh block-compression scheme it world be way better to have out-of-bounds access on this array(or option to redeclarate it with bigger size)

Try commented 2 years ago

изображение

With pack-algorithm fixup: Main-pass 1.36 ms Shadows 0.34 + 0.67

Try commented 2 years ago

Some progress on task shader: изображение

Main pass actually became slower, but good part is draw-call count: now whole frame takes 85 draw-calls instead of 2.3k.

Coockie commented 2 years ago

Is this a optimisation only for newer hardware like Nvidia Turing and Ampere and AMD RDNA2?

Try commented 2 years ago

@Coockie this iteration - yes only for latest hardware. Once culling algorithm is more-less complete and numbers are good we will expand to older hardware.

The way I'm planning to backport mesh shaders to Vulkan1.0 is based on this idea: https://tellusim.com/mesh-shader-emulation/

Basically set of compute-shaders + draw-indirect + software binner

Try commented 2 years ago

Who needs task shading? Just draw more meshlets: изображение

So far best iteration:

Try commented 2 years ago

Add meshlets for skinned meshes: изображение

Not great, not worse, timings, except for draw-call count. Basically looking forward to HiZ support

Try commented 2 years ago

Initial HiZ: only 32x32 tile size generated, with no mip-map chain for now Main-pass: 1.1 изображение

Try commented 2 years ago

Progress in HiZ. Overdraw:

Small waley behind the farm(center) and stuff occluded by mountain(left) was removed

Still there is performance issue: reading HiZ with textureGather(mip=0)/texelFetch*4 is not performant.

Try commented 2 years ago

Wrapping up for now: System-trace profiler shows very odd numbers: rendering is faster, but barriers are terribly large (~1ms see above). Frame profiler shows else picture. According to frame-profiler meshlets 12% faster.

In the end of the day: base-fps = 170, mesh fps = 204. So meshlets are good.

Meshlets: изображение Drawcalls: изображение

Meshlets Drawcalls
Fps 204 170
Triangles 102316 3184884
Drawcalls 19 4766

Sys-trace: изображение (Note: no idea what nsight displays as draw-call count. it's definitely not 19. Probably it's pipeline switches)

Try commented 2 years ago

Overdraw visualization: Baseline: изображение Meshlets: изображение