Open ziyuhuang123 opened 1 month ago
maybe it use 1 load warpgroup and 2 mma warp group
then: register is = (1 40 + 2 232) / 3 = 168? I guess
I agree. So the number of register is determined by:
In part 2, the register is num_2 = (KB+CT)/(B+C)
So overall register number is max(A, num_2)?
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
setmaxnreg is a new feature since Hopper. I noticed this in cutlass: https://github.com/NVIDIA/cutlass/blob/eee0cab26c8eedea447eb3b58b3498eeba2294da/include/cutlass/gemm/kernel/sm90_gemm_tma_warpspecialized_cooperative.hpp#L446 From above, the consumer register is 232, the producer register is 40. Different warp can use different register number??? This will affect Occupancy. Also, we can dynamicaly modify register during kernel running? This will affect Occupancy. Using NCU, I find a static register number 168, not 232, not 40. Anything wrong?