argilla-io / distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
https://distilabel.argilla.io
Apache License 2.0
1.39k stars 95 forks source link

[FEATURE] synthetic data generation for predictive NLP tasks #797

Open Josephrp opened 1 month ago

Josephrp commented 1 month ago

Feature : Create Dataset Pipelines

from raw "documents" / nodes / text (and other modalities?)

create NER / QnA pairs / Etc synthetically

Tasks

davidberenstein1957 commented 1 month ago

Thanks a lot for the suggestions @Josephrp. We have been discussing this too and love to get some more feedback from you as proposed on Slack :)