Closed rdspring1 closed 6 days ago
I am seeing changes from PR #3334, can you rebase to only include changes from this PR for easier review?
Oops. I think the merge of #3334 messed up the git history. You might have to resolve the conflicts by hand now.
I used git rebase
to fixed the conflicts.
!test
I renamed some variables to make things clearer. I hope it helps!!!
Overview:
buildSegment
creates the CPP Fusion for a given segment id, translates it to a python FusionDefinition, then returns a mapping from the segment fusion state indices to the original fusion state indices.FusionDefinition.segment
callssetupSegmentation
,buildSegment
, andfinalizeSegmentation
to create python definitions for the sub-fusions and their index mappings.Changes in this PR
This PR implements
buildSegment
function for user-scheduler segmentation. It is the second PR in a stack, preceded by https://github.com/NVIDIA/Fuser/pull/3334 and followed by https://github.com/NVIDIA/Fuser/pull/3025.buildSegment
function incsrc/python_frontend/segmentation.cpp
.segment
function innvfuser/__init__.py
Example:
Original Fusion: A reduction + broadcast + pointwise fusion.
After Segmentation: The reduction scheduler does not support fusing any operations with an inner reduction, so the original fusion is divided into two segments.
First Segment:
The first segment contains the reduction and broadcast operations, which corresponds with [T0, T2, T3] in the original fusion. Therefore, the segment index to original index map has two entries.
Second Segment:
The second segment is the pointwise addition with the broadcasted reduction. It corresponds with [T1, T3, T4] in the original fusion.