NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Other
271 stars 53 forks source link

[WIP] Use the AlmostExact map when traversing across multiple TV ops #3317

Closed naoyam closed 3 weeks ago

naoyam commented 3 weeks ago

This was an attempt to fix #3299. I believe the error happens due to a particular combination of reshape and expanded broadcast domains. Expanded broadcast domains become non-broadcast domains before reshape here. It seems that's causing some unexpected effects in the indexing traversal. I thought the use of the Permissive graph is suspicious and replacing it with AlmostExact does fix the error of #3299, but unfortunately it results in a different error, e.g.:

00:13:48 terminate called after throwing an instance of 'nvfuser::nvfError'
00:13:48   what():   INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/device_lower/analysis/index_compute.cpp":727, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. Could not find required iter domain in reference replay: iblockIdx.y172{108}

I think these are all due to the use of rather lax usage of permissive mappings. I don't understand why this particular part needs to use the Permissive graph, but apparently it results in the other error with AlmostExact.

I thought maybe fixing the legacy indexer could be a simple change, but apparently that's not the case. I'll think about a workaround by using the new IdModel-based indexer, which should not have these problems as it's much more strict with iter-domain mappings (although still not perfect).

naoyam commented 3 weeks ago

!build --diff