Closed PMunkes closed 3 years ago
I would expect these changes to become unnecessary with improved shader compilers. I only tested this to check if I could improve performance without major changes. I'm currently thinking about reimplementing this with a focus on RDNA2 (Seeing as I cannot get any nVidia hardware for a reasonable price). I don't currently expect to have anything concrete for a couple of weeks.
This patch no longer does anything on the newest driver. The heuristics for the wave32 mode have been updated and the reordering of the store does nothing anymore.
This patch improves performance on RDNA2 cards by 19%, as documented in this comment (link). It does this by switching the execution mode to wave32 (via the change in line 27) and reducing the amount of used vector registers in wave32 mode to increase occupancy (via the imageStore in line 87).