NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
256 stars 51 forks source link

The contiguity of a non-broadcast/reduction dimension must be true/false. #2555

Open wujingyue opened 3 months ago

wujingyue commented 3 months ago

Check out wjy/slice and _bn && bin/nvfuser_tests --gtest_filter=AliasTest.SliceOfExpandedBroadcast.

The bug is somewhere in https://github.com/NVIDIA/Fuser/blob/7a6f19cce1cf0167700047ca7eb58f53d71bc731/csrc/alias_analysis.cpp#L305-L330.

The root ID is an expanded broadcast and therefore has contiguity none. The rfactor ID, the product of the slicing, should have contiguity true/false instead of inheriting none from the root ID.

wujingyue commented 3 months ago

The rfactor ID, the product of the slicing, should have contiguity true/false instead of inheriting none from the root ID.

A second thought: Ideally, a slice of an expanded broadcast should still be an expanded broadcast with contiguity none and a smaller expanded extent. This way, the output is still an alias and therefore needn't allocated. This will need some cooperation from https://github.com/NVIDIA/Fuser/blob/6c6f3a40e09e6f8bece80b9b79c543945846c71b/csrc/ops/alias.cpp#L787-L795.