⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
As mentioned by @alvarobartt and Ellamind team, it would be nice to have a sequential model for executing the pipeline, in which no multiprocessing & batching is used.
The idea would be to load each step, process all the data, unload the step, load the next step, ...
Description
As mentioned by @alvarobartt and Ellamind team, it would be nice to have a sequential model for executing the pipeline, in which no multiprocessing & batching is used.
The idea would be to load each step, process all the data, unload the step, load the next step, ...