GFX10 and GFX10.3 (Navi, RDNA) support in ACO

Venemo commented 5 years ago

This issue is for tracking ACO's progress on Navi.

What works, what doesn't

All shader stages should work. Every Vulkan game should work.

If you find issues, please file a bug in the upstream Mesa bug tracker.

Tested hardware

[x] Navi 10: Radeon RX 5700
[x] Navi 10: Radeon RX 5700 XT
[x] Navi 10: Radeon RX 5600 XT tested and benchmarked by Phoronix
[x] Navi 14: Radeon RX 5500 XT tested and benchmarked by Phoronix
[ ] Navi 12 should work, but not tested
[x] Navi 2x: RX 6800 and 6800XT [tested and benchmarked by Phoronix] (https://www.phoronix.com/scan.php?page=article&item=rx6800-more-performance&num=1)

Not tested with unreleased Navi cards as we don't have those. If you test with hardware that is not on the list yet, please let us know.

How to test

We suggest using the latest stable mesa, where ACO is the default compiler of the RADV Vulkan driver.

ACO is in mesa since version 19.3 but on old mesa releases, the RADV_PERFTEST=aco environment variable was needed.

New hardware features support in Navi 1x

[x] Wave32 (support for 32 lanes rather than 64)
[x] NGG (Next Generation Geometry)

New hardware features support in Navi 2x

[ ] Hardware accelerated ray tracing
[ ] Mesh shaders
[ ] Variable rate shading

Possible optimizations

[ ] use round-robin register allocation to avoid WAR hazards (and help any post-RA scheduling)
[ ] schedule ALU instructions (after RA for easier/faster scheduling?)
[ ] choose registers to avoid bank conflicts (either as a reassignment pass or during RA)
See GCNRegBankReassign.cpp in LLVM
[ ] NGG shader based primitive culling

Venemo commented 5 years ago

One thing I noticed though, that aco performance looks just a tiny bit lower than llvm in that case

I think 110.2 vs 108.9 FPS is well within an acceptable margin, esp. considering that we haven't yet started working on any Navi-specific optimizations yet.

Why are in general such "hazards" on Navi?

Every GPU has similar problems, but each generation has their own specific issues. Even though the hardware developers do their best to make sure the hardware works as intended, sometimes these sort of problems still slip in.

Is it due to the need to support backwards compatibility with GCN ISA in RDNA?

The RDNA ISA is not backwards compatible with GCN. It is similar for sure, but you can't run the same binary on them.

TacoDeBoss commented 5 years ago

Is there a specified way to report test results? I've been playing Tetris Effect (non-steam) with ACO on my Navi 10 and it works beautifully.

shmerl commented 5 years ago

You can use the main benchmarks issue (#36), or may be there should be a separate one for Navi for clarity?

Venemo commented 5 years ago

@TacoDeBoss Thanks, I appreciate your feedback! I think we currently don't have a better way to report your results, but my collegaues will correct me there if I'm wrong.

shmerl commented 5 years ago

The RDNA ISA is not backwards compatible with GCN. It is similar for sure, but you can't run the same binary on them.

Wasn't that the whole point why AMD still support Wave64 on Navi in less efficient way, while natively suggesting to use Wave32? I.e. I thought it's supposed to be able to run GCN targeted shaders fine, but also should work with ones that are RDNA only. May be that kind of combination could potentially increase the probability of such hazards.

But anyway, I got your point about all GPUs potentially having such hardware issues.

yshui commented 5 years ago

I tried SteamVR home with aco-navi, which does not hang. But I noticed the whole scene looks much dimmer (lower brightness).

Venemo commented 5 years ago

@yshui Is this a navi specific problem?

aqxa1 commented 5 years ago

Most GPU hangs look to be fixed by the hazards updates, at least the games I have tested. Dragon Quest XI looks to be completely hang free, at least in the places where LLVM still hangs.

Venemo commented 5 years ago

Wasn't that the whole point why AMD still support Wave64 on Navi in less efficient way, while natively suggesting to use Wave32? I.e. I thought it's supposed to be able to run GCN targeted shaders fine, but also should work with ones that are RDNA only.

Well, it depends on what you mean by "compatible". What I meant was that a compiled shader that could run on Polaris or Vega can not run unmodified on Navi, so they are not compatible in this sense. This is not a suprising fact though, every GCN generation has (sometimes subtle) differences that makes it necessary to adapt the compiler to those generations.

If you are asking how close are the RDNA and GCN ISA, I think the difference between last gen GCN and RDNA is not that big. AMD could have called it GCN 6th gen, but maybe the marketing department decided they could use a refreshed branding. In case you are curious about the differences, you can take a look at the GFX10 commits in mesa.

May be that kind of combination could potentially increase the probability of such hazards.

The hazards are usually a result of some rare corner case that the hardware designers forgot to take care about. Most of them fall into either of two categories:

The hardware implements an instruction in a buggy manner, so there are some use cases which don't produce the intended results
There are mistakes when it comes do dealing with dependencies between different hardware units that run in parallel, for example when one unit reads a register that is written to by the other, etc

But anyway, I got your point about all GPUs potentially having such hardware issues.

Modern hardware is very complicated. It's not only GPUs that have problems, but everything else, eg. also CPUs. You can see a news item almost every week that talks about how some complicated sequence of instructions can cause unintended side effects; or how some instructions are unreliable on some processors.

Venemo commented 5 years ago

@aqxa1 Thanks for the feedback, it's nice to know that I'm heading in the right direction. Unfortunately those hazard mitigations are not perfect yet, and they don't yet take care of every possible occourence of the hazards. I plan to continue improving it next week.

shmerl commented 5 years ago

Further testing TW3, I found some anomaly with shadows:

llvm:

shadows_llvm

aco:

shadows_aco

Save at that place: tw3_save_shadows_bug.zip

You can move forward using aco, and notice that shadows are missing in some distance.

yshui commented 5 years ago

@Venemo Sorry, but I am not sure. I don't have other hardware I can test on.

Venemo commented 5 years ago

@shmerl @yshui Since there have been other, similar reports from users on Polaris and Vega, I am inclined to think that these rendering glitches are not specific to aco-navi.

Since the discussion in this issue is starting to get very unwieldy, can I ask you guys to please file separate issues about these rendering glitches? (Make sure to still mention that you experienced the problem on aco-navi, it is useful to know.)

I would like to keep this issue about just "getting things to work" on Navi, in other words, ensure that most games can start and play without a hang or a crash. Once that's done, I will start looking at possible optimizations and individual rendering glitches.

Venemo commented 5 years ago

@aqxa1 Can you please take a look at the latest aco-navi and see if you still get any hangs?

aqxa1 commented 5 years ago

@aqxa1 I updated my tests. I didn't test all of the games that were already fixed as of your previous hazard update, so I can't say if there are any regressions.

I will say that everything looks good, and Retroarch (Mednafen PSX HW) is the only application that seems to still be hanging. And it hangs with LLVM as well.

See my previous comment.

Venemo commented 5 years ago

Thank you @aqxa1 I'm happy to hear that. The hazard mitigation series has now been reviewed by Daniel and Rhys, so I'm gonna land those upstream. With that taken care of, only the subgroups MR remains before we can enable the Navi support in ACO by default.

Atrosha commented 5 years ago

Can anyone post a guide or a link to guide of compile this branch for Steam on Ubuntu 19.04 x64? Steam needs 32-bit drivers/libs too.

baryluk commented 5 years ago

@Atrosha You can just wait a day or two, as it is going to be merged into mainline Mesa master - https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2471

baryluk commented 5 years ago

@Atrosha You can take a look at my trivial script to build both 64-bit and 32-bit mesa with amdgpu support for Debian and Ubuntu. I am not sure it will work on Ubuntu, but it should be easy to modify.

https://gist.github.com/baryluk/1041204eff4cc4fad6f1508afe67b562

shmerl commented 5 years ago

Compiling 32-bit on 64-bit is a huge mess in Debian testing at least, unless you want to do it on 32-bit system.

For regular 64-bit build (on Debian, but could work for Ubuntu possibly), see: https://gist.github.com/shmerl/f4e5f76871239158cf083e37c5da56f4

To build Mesa-aco-navi, use it like this:

mesa_branch='aco-navi' mesa_repo='https://gitlab.freedesktop.org/Venemo/mesa.git' src_dir="${HOME}/build/mesa-aco" mesa_dir='/opt/mesa-aco' mesa_debian_build.sh

shmerl commented 5 years ago

@baryluk : I looked at your scripts. Interesting idea with using gcc-i686 toolchain. I can try that, since multilib method is totally borked.

baryluk commented 5 years ago

@shmerl It totally does work.

It should work on both Debian and Ubuntu. It took some time to figure out exactly which packages to install, and how to use proper pkg-config for 32-bit builds, env variables and cross files, but once you know it, it is straight forward. The only missing part for 32-bit build is libclc / libclang, but it is really needed just for OpenCL and even if enabled it will not work (there are other bugs in OpenCL implementation in Mesa). Filled a bug upstream - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941755

Other than that I use it daily to build mesa and 64-bit and 32-bit applications works just fine. I didn't want to make the script "too configurable", use Makefiles or functions (I don't see a point of having a function if it is only going to be called once), to make it straight forward to mess around, understand how it works or modify quickly.

I updated a script a little to be much easier to select LLVM and GCC versions, and made some package installation for amd64 and i386 less redundant and share common package selection automatically.

https://gist.github.com/baryluk/1041204eff4cc4fad6f1508afe67b562

Building aco-navi branch is one line change.

shmerl commented 5 years ago

I already filed a similar bug before: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890801

Anyway, apparently cross compiler has better situation than multilib one, so I'll try to update my script to use the former.

Atrosha commented 5 years ago

@baryluk at first I change Debian to LLVM repo in Ubuntu. Script uninstalls glslang-dev both: i386 and amd64 while compiling. And it needs to install gcc(g++)-9-i686 manually, coz installs version 8 by default.

So after successful compile enable-new-mesa-opt.source doesn't work on Ubuntu.

baryluk commented 5 years ago

@Atrosha I see. I updated the script a bit to install explicitly specific versions of gcc now.

glslang-dev is not required to build Mesa. I don't have it installed on my OS actually. But the script shouldn't make it remove it. In Debian and Ubuntu glslang-dev and glslang-tools package essentially have no dependencies so should not conflict with anything, even weird LLVM versions, and apt should never decide to remove it. So I don't know how it is possible it did remove glslang-dev.

The issue is probably somewhere else.

Could you download a new version from the gist and try again?

@shmerl My script is still kind of not formally correct , because it actually uses 64-bit llvm-config for 32-bit build. But fortunately both 32-bit and 64-bit versions do output same linker and compiler flags, so it does work. If they would output different include paths, it would not work. But exactly same headers are used for both, so it is not a problem. AFAIK llvm-config doesn't understand multiarch, and it is not possible to install both 32-bit and 64-bit versions at the same time, and I wouldn't expect them to support that either.

shmerl commented 5 years ago

I just install one version at a time, and then remove it and install another one for different bitness build. But last time even doing that, co-installation of various stuff with impossible to get a consistent result using gcc-multilb. I'll try with cross compiler, to see how it fares.

You can specify 32-bit llvm in "native" file for meson.

baryluk commented 5 years ago

@shmerl I never tried multilib. I just install cross-compilers, way easier and reliable, not to mention you can build also windows and arm binaries this way too. No you can't specificy 32-bit llvm-config in native file for meson native file because there is no 32-bit llvm-config available and co-installable with 64-bit llvm-config. But it is not a big deal. meson does have some disctinction between native and cross files, but I don't really use native. In fact I only use cross files even when compiling natively. But not to select architecture specific things, but to select llvm10 instead of other llvm versions possibly installed on the system.

shmerl commented 5 years ago

No you can't specificy 32-bit llvm-config in native file for meson native file because there is no 32-bit llvm-config available and co-installable with 64-bit llvm-config

That's why I said, I remove one before installing the other. Not a big deal really. If co-installation of other stuff works, it should be OK.

baryluk commented 5 years ago

@shmerl Ah, yes. You can do that of course. I just find installing and removing one then another version constantly a bit annoying.

shmerl commented 5 years ago

Anyway, to avoid noise in this issue, I recommend anyone who needs help with building Mesa, to move the discussion here.

baryluk commented 5 years ago

@shmerl Agreed. Totally off topic.

Venemo commented 5 years ago

@shmerl @baryluk Sorry for my late reply, but if it's any help to you, here is a writeup about how I build mesa and run some apps using the built mesa: https://gist.github.com/Venemo/a9483106565df3a83fc67a411191edbd (without replacing the system package). This may need adjustments on Debian (such as /usr/lib64 and similar things), but should give you a good idea.

shmerl commented 5 years ago

@Venemo: Thanks, I'll take a look. The building part seems to be also using cross compiler for 32-bit, so I'll definitely go with that to finally have a script that builds both 64-bit and 32-bit. Using it without replacing the system packages looks quite similar to the approach I'm using: https://gist.github.com/shmerl/611b5b1670eb5963d0d99a4512ed8674

baryluk commented 5 years ago

@Venemo This is essentially what my scripts are doing, just fully automated plus both 64 and 32 bit build, opt and debug, plus not pollute system locations or even require root :D https://gist.github.com/baryluk/1041204eff4cc4fad6f1508afe67b562 Thanks anyway.

Zamundaaa commented 5 years ago

As ACO for Navi seems to be enabled with the AUR aco repo now I tested on my Red Devil 5700 XT: SteamVR Blade & Sorcery Gravity Lab Pavlov VR Witcher 3 I'll test some more in the next few days. When generating the shaders it was a little smoother (in VR that's quite noticable) and from there on not much difference, only a little stuttering in B&S was gone, which is nice. No graphics glitches or hangs.

shmerl commented 5 years ago

To clarify from where to get latest code for testing Navi with ACO. Should we still be using https://gitlab.freedesktop.org/Venemo/mesa/commits/aco-navi or more upstream https://github.com/daniel-schuermann/mesa/commits/master ?

Venemo commented 5 years ago

I recommend to just use upstream mesa which has everything that was in aco-navi, and almost everything that's in Daniel's repo. Depending on your preference with regards to stability, you can go with either 19.3 RC, or master.

EDIT: to reduce the confusion, I now deleted the old aco-navi branch.

shmerl commented 5 years ago

What about ACO bug tracking, progress tracking and etc. Will that remain on Github, or also will be moved to Mesa Giltab?

Venemo commented 5 years ago

I will respond to any bug you might find in ACO on Navi, regardless of which place you report it. If there is a bug report that I don't notice feel free to ping me.

Not sure how long Daniel wants to maintain this repo, I assume on the long run all of it will move to upstream Mesa. @daniel-schuermann what is your stance on this? Personally, I have a slight preference towards using Mesa Gitlab, because it makes it easier to track which commits fix which problem.

daniel-schuermann commented 5 years ago

As some packages are based off this repo, we will maintain it for a little longer. It also contains one last performance-improving patch series from @pendingchaos which is not upstream yet, but should land soon. That said, I think with mesa 20.0 we'll fade it out into stable. Bug tracking and progress tracking will probably stay even further than that, but hopefully, there won't be much left at that point.

baryluk commented 5 years ago

@daniel-schuermann Maybe announce a closure of accepting NEW issues on this repo (~Jan 2020?), and only deal with existing ones until they are closed. For all the other stuff redirect people to upstream mesa.

Venemo commented 4 years ago

We've recently landed Wave32 support in ACO. Those willing to test it can do so using RADV_PERFTEST. AMD recommends using this mode for compute shaders and the geometry engine. In case of pixel shaders, the old Wave64 mode is recommended since those shaders are more efficient in this mode due to how interpolation and the cache works in the hardware.

In my testing, cswave32 and gewave32 yield a couple of extra FPS in some games. This is not yet fully optimized, Rhys has an open MR which should give it a nice boost, once merged. The pswave32 on the other hand reduces overall performance and is not recommended.

If you want to test, you need to use upstream mesa with RADV_PERFTEST

# The recommended way is to use Wave32 mode for compute shaders and the geometry engine
RADV_PERFTEST=aco,cswave32,gewave32 ./Talos

Venemo commented 4 years ago

In the meantime I'm working on a post-RA scheduler (based on Daniel's old code) with the intention of taking advantage of Navi's ILP (instruction level parallelism). The idea is to reorder the instructions (if possible) in such a manner that allows to execute as many as possible in parallel.

There are some issues that I'd like to resolve before opening a MR, such as compilation speed, it currently slows down ACO by about 10% which is not acceptable. However I can already confirm that it has a positive effect, I managed to squeeze out some extra frames per second from a couple of games. Will update you guys with some more concrete benchmarks once the code gets closer to final.

Venemo commented 4 years ago

This issue has been quiet for a while, so here is a quick heads up:

Rhys has implemented geometry shaders support, which also works on Navi.
He also made a couple of optimizations and took care of VOP3 literals and such.
I'm now working on getting NGG working. So far I have some code which makes NGG VS work, next up on my TODO list is NGG GS, among other things.

shmerl commented 4 years ago

@Venemo: Do you see any performance improvements from NGG in your tests?

Venemo commented 4 years ago

@shmerl It's hard to say, because so far I've only got NGG VS covered. The "big fish" will be NGG GS. It will definitely provide some improvement, but I don't yet know how much.

However, considering your benchmark results so far, I wouldn't hold my breath that this will fix The Witcher 3 for you. One of us will have to take a close look at that game to determine where the bottleneck is. (I suspect that the problem exists in DXVK, because DXVK on Windows already performs 20% worse than D3D11, but I don't know what the problem is for sure.)

shmerl commented 4 years ago

@Venemo: It did get slightly better than llvm on Navi after recent batch of optimizations, but I suspect it's using a lot of tessellation shaders, so it will benefit more once those are implemented in ACO.

Venemo commented 4 years ago

@shmerl I ran some benchmarks yesterday with ACO NGG VS together with LLVM NGG VS+GS (as ACO doesn't support the merged VS+GS yet). It did not give me any noticable difference for any game that I tested with, compared to the legacy (non-NGG) pipeline. EDIT: I tested with Talos, RotTR and TW3.

Regardless of the performance implications, we'll probably eventually still need to support NGG, since it's not clear what will become of the legacy path in future chips. However it appears that NGG is not a silver bullet that would magically elevate our frame rates.

I suspect that the bottleneck lies with some very heavy "monster" pixel shaders rather than the geometry engine.

shmerl commented 4 years ago

Now that NGG is on in ACO for Navi, should this message still appear?

WARNING: disabling NGG because ACO is used.

Venemo commented 4 years ago

@shmerl As noted in the merge request and here, we concluded that NGG gives basically zero performance benefits, so we concluded that it's not worth to rush it into the 20.0 release. Instead we'll merge it in 20.1 when not only VS but also GS+VS is ready. There is already a LOT of improvements in ACO since 19.3 so still plenty to get excited about.

daniel-schuermann / mesa