NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
257 stars 51 forks source link

Faulty `rf` flag added for non-sliced IterDomains. #1859

Open wujingyue opened 7 months ago

wujingyue commented 7 months ago
$ NVFUSER_DUMP=fusion_ir bin/nvfuser_tests --gtest_filter=*Noncancellable_CatOnlySubsetOfSplitOutputs
[ RUN      ] MoveSplitCatTest.Noncancellable_CatOnlySubsetOfSplitOutputs

%kernel {
T1_g[ iS2{4}rf, iS4{2}rf ]
   = slice( T0_g[ iS0{4}, iS1{10} ], { {0, 4, 1} {0, 2, 1} } )

TransformPrinter : 
T0_g[ iS0{4}, iS1{10} ]
 root domain : (iS0{4}, iS1{10})
 contiguity: t t
 leaf domain : (iS0{4}, iS1{10})
T1_g[ iS2{4}rf, iS4{2}rf ]
 root domain : (iS2{4}rf, iS3{10}rf)
 allocation domain : (iS2{4}rf, iS4{2}rf)
  Resize: iS3{10}rf by 0 and -8 -> iS4{2}rf
 rfactor domain : (iS2{4}rf, iS4{2}rf)
 contiguity: f t
 leaf domain : (iS2{4}rf, iS4{2}rf)
}

T1_g's first dimension isn't sliced but still has the rf flag on. This behavior is unintended according to https://github.com/NVIDIA/Fuser/blob/71d9b555b401655c0d664b94e8abb7021ce29637/csrc/ops/alias.cpp#L744. The sameAs check returns false because int 4 is not the same as index 4.

jacobhinkle commented 7 months ago

Even worse than the flag is probably that resize is called on the id in that other branch. I can't check at the moment but I wonder does it wind up creating a Resize node or is that function able to correctly notice there's no pad required. Or maybe it's removed later.

wujingyue commented 7 months ago

is that function able to correctly notice there's no pad required

I'm pretty sure it's ^^^. SimplifyingIrBuilder is smart enough to tell that. (But we still end up with that wrong rf flag in the IR)