argilla-io / distilabel

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
https://distilabel.argilla.io
Apache License 2.0
1.21k stars 78 forks source link

[FEATURE] Sequential execution for local pipeline #579

Open gabrielmbmb opened 3 months ago

gabrielmbmb commented 3 months ago

Description

As mentioned by @alvarobartt and Ellamind team, it would be nice to have a sequential model for executing the pipeline, in which no multiprocessing & batching is used.

The idea would be to load each step, process all the data, unload the step, load the next step, ...

bjoernpl commented 3 months ago

Just adding here that a goal of this would be to enable proper debugging within steps/tasks in the pipeline. Thanks for picking this up! :)