This was an attempt to fix #3299. I believe the error happens due to a particular combination of reshape and expanded broadcast domains. Expanded broadcast domains become non-broadcast domains before reshape here. It seems that's causing some unexpected effects in the indexing traversal. I thought the use of the Permissive graph is suspicious and replacing it with AlmostExact does fix the error of #3299, but unfortunately it results in a different error, e.g.:
00:13:48 terminate called after throwing an instance of 'nvfuser::nvfError'
00:13:48 what(): INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/device_lower/analysis/index_compute.cpp":727, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. Could not find required iter domain in reference replay: iblockIdx.y172{108}
I think these are all due to the use of rather lax usage of permissive mappings. I don't understand why this particular part needs to use the Permissive graph, but apparently it results in the other error with AlmostExact.
I thought maybe fixing the legacy indexer could be a simple change, but apparently that's not the case. I'll think about a workaround by using the new IdModel-based indexer, which should not have these problems as it's much more strict with iter-domain mappings (although still not perfect).
This was an attempt to fix #3299. I believe the error happens due to a particular combination of reshape and expanded broadcast domains. Expanded broadcast domains become non-broadcast domains before reshape here. It seems that's causing some unexpected effects in the indexing traversal. I thought the use of the Permissive graph is suspicious and replacing it with AlmostExact does fix the error of #3299, but unfortunately it results in a different error, e.g.:
I think these are all due to the use of rather lax usage of permissive mappings. I don't understand why this particular part needs to use the Permissive graph, but apparently it results in the other error with AlmostExact.
I thought maybe fixing the legacy indexer could be a simple change, but apparently that's not the case. I'll think about a workaround by using the new IdModel-based indexer, which should not have these problems as it's much more strict with iter-domain mappings (although still not perfect).