argilla-io / distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
https://distilabel.argilla.io
Apache License 2.0
1.6k stars 126 forks source link

[DOCS] tutorial - generate data for training embeddings and reranking models #890

Closed davidberenstein1957 closed 2 months ago

davidberenstein1957 commented 2 months ago

Which page or section is this issue related to?

An addaptation of https://docs.zenml.io/user-guide/llmops-guide/finetuning-embeddings/finetuning-embeddings-with-sentence-transformers

What are you documenting, or what change are you making in the documentation?

NA

gabrielmbmb commented 2 months ago

Hi @davidberenstein1957 , we already have https://github.com/argilla-io/argilla-sdk-chatbot and https://github.com/argilla-io/argilla-sdk-chatbot/blob/main/train_embedding.ipynb, maybe we can add a reference.