NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

Fix the legacy loop indexing traversal #3373

Closed naoyam closed 1 week ago

naoyam commented 2 weeks ago

This is a temporary WAR for #3374. It's temporary since the repro has no problem with the IdModel-based indexer. This is for unblocking @IvanYashchuk until we can make the new indexer enabled by default.

The root cause of the issue is when we attempt to find a correct indexing path from the loop domain to the allocation domain of the indexed tensor, the algorithm fails to find a path visiting a backward merge when the indexed tensor has only one of the inputs. That happens when the tensor is broadcast and gets inlined with broadcast forwarding. In the current code, in that case, it just picks the first traversal option, which I think happens to be working fine, but that's not necessarily the right chose, particularly because we are looking at all candidate next traversal targets that are permissively mapped.

The WAR is simply picking a candidate as long as it has at least one mapped ID. I think this would be good enough as a temporary WAR.

Fixes #3374

naoyam commented 2 weeks ago

!test --diff

naoyam commented 2 weeks ago

!test --pybench