I found this when creating a test for something else. Most existing tests exercise contiguous input tensors, so this issue hasn't been caught so far, I think.
This appears to be a limitation in make_resharding_contiguous or insert_reshardings. We can decompose the set into a non-resharding set that makes the tensor contiguous followed by a resharding, all-gather set.
I found this when creating a test for something else. Most existing tests exercise contiguous input tensors, so this issue hasn't been caught so far, I think.
Repro:
Apply https://github.com/NVIDIA/Fuser/pull/3070.
-np 1
repros too so you can run this on a single-GPU workstation.