Open Venemo opened 5 years ago
Some recent Navi progress: ACO now supports NGG (Next Generation Geometry) for vertex and tessellation evaluation shaders. Our implementation is based on a few ideas from RadeonSI, and is slightly more efficient than what RADV/LLVM does. That being said, I still didn't observe a noticable performance benefit from NGG yet. However, we still need to support it if we want to be future proof.
With this, and our recent addition of tessellation shaders, the only shader stages missing are the merged NGG geometry shaders ngg_vertex_geometry_gs
and ngg_tess_eval_geometry_gs
. These are in my plans, but they are not a priority right now.
Any new config keys to enable those?
@shmerl You don't need to do anything to enable them. They will be used automatically, except for NGG GS which is not supported so will fallback to legacy GS.
From the above and some tests I run, there is indeed no noticeable performance impact. Is NGG path itself envisioned by AMD as something faster, or as simply a different hardware path to replace the old one, without the focus on better performance? May be it has some other benefits like better power efficiency?
NGG is more flexible and potentially faster if culling is implemented (though afaik this isn't useful for games)
NGG GS can be faster than legacy GS because it eliminates the GSVS ring
@shmerl Speaking of the traditional Vulkan or OpenGL pipeline, NGG is not going to do anything revolutionary. For vertex and tess eval shaders, the same thing happens as before, but the shader program is more explicitly responsible for some hw details (allocating GS space, exporting primitives, etc). For geometry shaders, NGG eliminates a copy to and from VMEM (this is the GSVS ring that Ryhs mentioned), which might give a slight improvement.
However, consider new use cases like mesh shaders and such, along with features like primitive culling, so basically any kind of usage which needs the shader program to have more fine grained control over the vertices and primitives emitted. Those are made very easy to implement with NGG, while the legacy GS stage is not really feasible for these kind of applications.
So, in my opinion, you should think of NGG as an enabler for new and more efficient geometry features, rather than a silver bullet to make your games fast.
I now have a branch which adds support for ACO NGG GS, and works with the sample app and a few games.
Currently missing support for streamout (aka. transform feedback) and shader queries, but those are going to be next.
What is the current state for enabling cswave32 and gewave32. Is it recommended or may be they are on by default?
@shmerl They are still off by default. They may or may not give a performance advantage in some games. We currently don't have a good way to predict whether Wave32 is advantageous for a given shader or not. I still plan to work on this but it's not high priority. (There are other things we can do which give more measurable benefits.)
This issue is for tracking ACO's progress on Navi.
What works, what doesn't
All shader stages should work. Every Vulkan game should work.
If you find issues, please file a bug in the upstream Mesa bug tracker.
Tested hardware
Not tested with unreleased Navi cards as we don't have those. If you test with hardware that is not on the list yet, please let us know.
How to test
We suggest using the latest stable mesa, where ACO is the default compiler of the RADV Vulkan driver.
ACO is in mesa since version 19.3 but on old mesa releases, the
RADV_PERFTEST=aco
environment variable was needed.New hardware features support in Navi 1x
New hardware features support in Navi 2x
Possible optimizations
[ ] use round-robin register allocation to avoid WAR hazards (and help any post-RA scheduling)
[ ] schedule ALU instructions (after RA for easier/faster scheduling?)
[ ] choose registers to avoid bank conflicts (either as a reassignment pass or during RA)
See GCNRegBankReassign.cpp in LLVM
[ ] NGG shader based primitive culling