Closed zhanglx13 closed 5 months ago
@zhanglx13 Do you know exactly what adding this argument achieves? (In terms of how it affects code generation)
@oplavsic
With this attribute, the kernel argument will be preloaded into SGPR instead of memory by the firmware during some kernel launch time. Therefore, at the beginning of the kernel, no s_load_dword
is required to load kernel args.
In terms of 16, it is the max number of user SGPRs that can be used for preloading. Maybe I should not hard-code it.
@oplavsic So I removed the I<16
part.
When there are more than 16 args that want to be preloaded, the firmware will preload the first 16 and the rest still go to memory.
@jayfurmanek no Let me try MI250X
@jayfurmanek As expected, no difference on MI250X. As long as it does not break anything on MI250, we should be good.
ok great! I'll approve
Also fixed some bugs in the tuning script
This helps small gemms (~ 10us kernels) a lot. One gemm's execution time drops from 9.5 us to 7.5 us.