NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

Inspect all IDs instead of just loop in ParallelDimensionMap #3376

Closed jacobhinkle closed 1 week ago

jacobhinkle commented 1 week ago

This is important for Hopper MMA (see #3278) in which we only parallelize TIDx on the allocation domain of the MmaOp output. Currently this leads to us generating a usable kernel but we are not able to launch it properly because we can't infer the x dimension of the block size. This PR fixes that by replacing tv->getLoopDomain() with tv->domain()->allIDs() which will inspect the root, logical, loop, allocation domains and even intermediate IterDomains to try and find parallelized dimensions.

jacobhinkle commented 1 week ago

!test

jacobhinkle commented 1 week ago

!build