⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
This PR adds a new feature in which the steps are divided in several load stages marked by the position of the GlobalSteps in the pipeline. GlobalSteps receives all the data at once (in one batch), and therefore, they require all its previous steps to have finished before being able of processing the data. Having that said, it's not necessary to load the GlobalStep until its previous steps have finished its execution, saving some resources in the meantime. Also, it's not necessary to load the successors steps of a GlobalStep until it has finished it's execution. Therefore, the load stages are marked by the position of the GlobalSteps in a pipeline:
Previous steps of a GlobalStep will be grouped in a stage.
Each GlobalStep will have it's own stage.
Successors of a GlobalStep will be grouped in a stage.
Description
This PR adds a new feature in which the steps are divided in several load stages marked by the position of the
GlobalStep
s in the pipeline.GlobalStep
s receives all the data at once (in one batch), and therefore, they require all its previous steps to have finished before being able of processing the data. Having that said, it's not necessary to load theGlobalStep
until its previous steps have finished its execution, saving some resources in the meantime. Also, it's not necessary to load the successors steps of aGlobalStep
until it has finished it's execution. Therefore, the load stages are marked by the position of theGlobalStep
s in a pipeline:GlobalStep
will be grouped in a stage.GlobalStep
will have it's own stage.GlobalStep
will be grouped in a stage.