NVIDIA / NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs
Apache License 2.0
478 stars 57 forks source link

[FEA] Add support for Multiple Model Quality Classification #70

Open sarahyurick opened 4 months ago

sarahyurick commented 4 months ago

In previous versions of NeMo Curator, we supported multiple model quality classification with a combination of Slurm and Python scripts. These scripts were designed to allow the user to pass in multiple model paths at once for running multi-node multi-GPU data classification.

Now, we are moving away from Slurm scripts in favor of a Python API. I think we should eventually create a Python API (ideally using Crossfit) to support multiple model classification.

sarahyurick commented 2 months ago

For reference: