PR #2999 adds a presegmentation pass to force segmentation when inplace update can cause RW race.
This occurs when an intermediate tensorview is aliased to a fusion input, a RW race occurs, when the intermediate tensorview or the aliased input is in path of a broadcast.
This preseg pass currently does not consider how the size of the broadcasted tv differs from aliased input/output, in which case segmentation may not be required.
No, I need to investigate this issue and understand if any changes are required to the current implementation of the 'SegmentInplaceUpdate' preset pass.
PR #2999 adds a presegmentation pass to force segmentation when inplace update can cause RW race. This occurs when an intermediate tensorview is aliased to a fusion input, a RW race occurs, when the intermediate tensorview or the aliased input is in path of a broadcast.
This preseg pass currently does not consider how the size of the broadcasted tv differs from aliased input/output, in which case segmentation may not be required.
Consider the test: https://github.com/NVIDIA/Fuser/blob/2ec2b926a0c589f7b72bd7a3abce7a49111c5620/tests/cpp/test_alias.cpp#L980-L1029, the fusion is segmented such that the RW race does not occur. Additionally, the broadcasted size is not different from the aliased input/output. However, this preseg pass will still insert a segment set and split the fusion into 3 segments, even though 2 segments is functionally correct.
This issue is to track how the preseg pass needs to be modified to identify such cases.