GFX10 and GFX10.3 (Navi, RDNA) support in ACO

Venemo commented 5 years ago

This issue is for tracking ACO's progress on Navi.

What works, what doesn't

All shader stages should work. Every Vulkan game should work.

If you find issues, please file a bug in the upstream Mesa bug tracker.

Tested hardware

[x] Navi 10: Radeon RX 5700
[x] Navi 10: Radeon RX 5700 XT
[x] Navi 10: Radeon RX 5600 XT tested and benchmarked by Phoronix
[x] Navi 14: Radeon RX 5500 XT tested and benchmarked by Phoronix
[ ] Navi 12 should work, but not tested
[x] Navi 2x: RX 6800 and 6800XT [tested and benchmarked by Phoronix] (https://www.phoronix.com/scan.php?page=article&item=rx6800-more-performance&num=1)

Not tested with unreleased Navi cards as we don't have those. If you test with hardware that is not on the list yet, please let us know.

How to test

We suggest using the latest stable mesa, where ACO is the default compiler of the RADV Vulkan driver.

ACO is in mesa since version 19.3 but on old mesa releases, the RADV_PERFTEST=aco environment variable was needed.

New hardware features support in Navi 1x

[x] Wave32 (support for 32 lanes rather than 64)
[x] NGG (Next Generation Geometry)

New hardware features support in Navi 2x

[ ] Hardware accelerated ray tracing
[ ] Mesh shaders
[ ] Variable rate shading

Possible optimizations

[ ] use round-robin register allocation to avoid WAR hazards (and help any post-RA scheduling)
[ ] schedule ALU instructions (after RA for easier/faster scheduling?)
[ ] choose registers to avoid bank conflicts (either as a reassignment pass or during RA)
See GCNRegBankReassign.cpp in LLVM
[ ] NGG shader based primitive culling

Venemo commented 4 years ago

Some recent Navi progress: ACO now supports NGG (Next Generation Geometry) for vertex and tessellation evaluation shaders. Our implementation is based on a few ideas from RadeonSI, and is slightly more efficient than what RADV/LLVM does. That being said, I still didn't observe a noticable performance benefit from NGG yet. However, we still need to support it if we want to be future proof.

With this, and our recent addition of tessellation shaders, the only shader stages missing are the merged NGG geometry shaders ngg_vertex_geometry_gs and ngg_tess_eval_geometry_gs. These are in my plans, but they are not a priority right now.

shmerl commented 4 years ago

Any new config keys to enable those?

Venemo commented 4 years ago

@shmerl You don't need to do anything to enable them. They will be used automatically, except for NGG GS which is not supported so will fallback to legacy GS.

shmerl commented 4 years ago

From the above and some tests I run, there is indeed no noticeable performance impact. Is NGG path itself envisioned by AMD as something faster, or as simply a different hardware path to replace the old one, without the focus on better performance? May be it has some other benefits like better power efficiency?

pendingchaos commented 4 years ago

NGG is more flexible and potentially faster if culling is implemented (though afaik this isn't useful for games)

NGG GS can be faster than legacy GS because it eliminates the GSVS ring

Venemo commented 4 years ago

@shmerl Speaking of the traditional Vulkan or OpenGL pipeline, NGG is not going to do anything revolutionary. For vertex and tess eval shaders, the same thing happens as before, but the shader program is more explicitly responsible for some hw details (allocating GS space, exporting primitives, etc). For geometry shaders, NGG eliminates a copy to and from VMEM (this is the GSVS ring that Ryhs mentioned), which might give a slight improvement.

However, consider new use cases like mesh shaders and such, along with features like primitive culling, so basically any kind of usage which needs the shader program to have more fine grained control over the vertices and primitives emitted. Those are made very easy to implement with NGG, while the legacy GS stage is not really feasible for these kind of applications.

So, in my opinion, you should think of NGG as an enabler for new and more efficient geometry features, rather than a silver bullet to make your games fast.

Venemo commented 4 years ago

I now have a branch which adds support for ACO NGG GS, and works with the sample app and a few games.

Currently missing support for streamout (aka. transform feedback) and shader queries, but those are going to be next.

shmerl commented 3 years ago

What is the current state for enabling cswave32 and gewave32. Is it recommended or may be they are on by default?

Venemo commented 3 years ago

@shmerl They are still off by default. They may or may not give a performance advantage in some games. We currently don't have a good way to predict whether Wave32 is advantageous for a given shader or not. I still plan to work on this but it's not high priority. (There are other things we can do which give more measurable benefits.)

daniel-schuermann / mesa