Segmentation decomposes a fusion into a directed acyclic graph (DAG) of sub-fusions. After applying the segmentation algorithm, we can translate the sub-fusions into their corresponding python definitions. Then, given the fusion's input arguments, the segments are run in the correct order to produce the output results.
The original FusionDefinition stores the sequence of sub-fusions and acts as an argument manager. It gathers the input arguments before running the sub-fusion and stores its results. To perform this function, it requires a map from the segment index space to the original index space. This mapping is generated while creating the python definition for each sub-fusion.
CPP functions:
Step 1: setupSegmentation runs the segmentation algorithm on the CPP Fusion to create the SegmentedFusion. Then, sub-fusions are ordered according to their dependencies by the prepareGroupOrder function. It returns the number of segments in SegmentedFusion.
Step 2: buildSegment creates the CPP Fusion for a given segment id, translates it to a python FusionDefinition, then returns a mapping from the segment fusion state indices to the original fusion state indices.
Step 3: finalizeSegmentation destroys any state stored in FusionDefinition.
Python functions:
setupSegmentation, buildSegment, and finalizeSegmentation are called together in FusionDefinition.segment.
If a python FusionDefinition has segments, call _execute_segments in the FusionDefinition.execute. The original FusionDefinition acts as argument manager, running the sub-fusions in topological order.
Example:
Original Fusion: A reduction + broadcast + pointwise fusion.
The reduction scheduler does not support fusing any operations with an inner reduction, so the original fusion is divided into two segments. Segment 2 depends on Segment 1, so there is a strict ordering of the segments.
The first segment contains the reduction and broadcast operations, which corresponds with [T0, T2, T3] in the original fusion.
The second segment is the pointwise addition with the broadcasted reduction. It corresponds with [T1, T3, T4] in the original fusion.
Create SegmentationState class that contains all segmentation logic for python-frontend.
All segmentation logic is contained in a separate file - csrc/python_frontend/segmentation.h
FusionDefinition contains an instantiation of SegmentationState and exposes its logic in a public interface. This interface is added to the python bindings.
Created test_segmentation_reduction_pointwise_epilogue to test functionality.
General Overview of Segmentation:
Segmentation decomposes a fusion into a directed acyclic graph (DAG) of sub-fusions. After applying the segmentation algorithm, we can translate the sub-fusions into their corresponding python definitions. Then, given the fusion's input arguments, the segments are run in the correct order to produce the output results.
The original FusionDefinition stores the sequence of sub-fusions and acts as an argument manager. It gathers the input arguments before running the sub-fusion and stores its results. To perform this function, it requires a map from the segment index space to the original index space. This mapping is generated while creating the python definition for each sub-fusion.
CPP functions:
Step 1:
setupSegmentation
runs the segmentation algorithm on the CPP Fusion to create theSegmentedFusion
. Then, sub-fusions are ordered according to their dependencies by theprepareGroupOrder
function. It returns the number of segments inSegmentedFusion
.Step 2:
buildSegment
creates the CPPFusion
for a given segment id, translates it to a pythonFusionDefinition
, then returns a mapping from the segment fusion state indices to the original fusion state indices.Step 3:
finalizeSegmentation
destroys any state stored inFusionDefinition
.Python functions:
setupSegmentation
,buildSegment
, andfinalizeSegmentation
are called together inFusionDefinition.segment
.FusionDefinition
has segments, call_execute_segments
in theFusionDefinition.execute
. The originalFusionDefinition
acts as argument manager, running the sub-fusions in topological order.Example:
Original Fusion: A reduction + broadcast + pointwise fusion.
After Segmentation:
First Segment:
Second Segment:
Changes in this PR
This PR implements
setupSegmentation
function for user-scheduler segmentation. It is the first PR in a stack, followed by https://github.com/NVIDIA/Fuser/pull/3335 and https://github.com/NVIDIA/Fuser/pull/3025.SegmentationState
class that contains all segmentation logic for python-frontend.csrc/python_frontend/segmentation.h
FusionDefinition
contains an instantiation ofSegmentationState
and exposes its logic in a public interface. This interface is added to the python bindings.test_segmentation_reduction_pointwise_epilogue
to test functionality.