The following change solves SSAO performance problems on M1 for me. With this change, I get stable 120 fps, without roughly around 30 fps on an M1 Max.
xcode's profiler tells me that we spend a lot of time in initializing the sample_sphere value in the SSAO kernel. Maybe it is reserving registers for it, so that destroys occupancy? Just a guess.
The following change solves SSAO performance problems on M1 for me. With this change, I get stable 120 fps, without roughly around 30 fps on an M1 Max.
xcode's profiler tells me that we spend a lot of time in initializing the sample_sphere value in the SSAO kernel. Maybe it is reserving registers for it, so that destroys occupancy? Just a guess.