ROCm / triton

Development repository for the Triton language and compiler
MIT License
80 stars 23 forks source link

Add InReg attribute to function args #490

Closed zhanglx13 closed 5 months ago

zhanglx13 commented 5 months ago

Also fixed some bugs in the tuning script

This helps small gemms (~ 10us kernels) a lot. One gemm's execution time drops from 9.5 us to 7.5 us.

oplavsic commented 5 months ago

@zhanglx13 Do you know exactly what adding this argument achieves? (In terms of how it affects code generation)

zhanglx13 commented 5 months ago

@oplavsic With this attribute, the kernel argument will be preloaded into SGPR instead of memory by the firmware during some kernel launch time. Therefore, at the beginning of the kernel, no s_load_dword is required to load kernel args. In terms of 16, it is the max number of user SGPRs that can be used for preloading. Maybe I should not hard-code it.

zhanglx13 commented 5 months ago

@oplavsic So I removed the I<16 part. When there are more than 16 args that want to be preloaded, the firmware will preload the first 16 and the rest still go to memory.

zhanglx13 commented 5 months ago

@jayfurmanek no Let me try MI250X

zhanglx13 commented 5 months ago

@jayfurmanek As expected, no difference on MI250X. As long as it does not break anything on MI250, we should be good.

jayfurmanek commented 5 months ago

ok great! I'll approve