danny-avila / LibreChat

Enhanced ChatGPT Clone: Features Anthropic, OpenAI, Assistants API, Azure, Groq, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. Actively in public development.
https://librechat.ai/
MIT License
17.27k stars 2.87k forks source link

Enhancement: Adding embedding and fine-tuning for training #900

Open onigetoc opened 1 year ago

onigetoc commented 1 year ago

Contact Details

No response

What features would you like to see added?

Implementing embedding and fine-tuning for training.

It's also mean file uploading to openai for training. From a backend setting. It may also be adding, uploading files from front end for user with drag n drop and conventional input file uploading.

More details

Which components are impacted by your request?

No response

Pictures

No response

Code of Conduct

danny-avila commented 1 year ago

Thanks for the request. I agree I think this would be a really welcome feature. I'll keep this in mind as I integrate file support (retrieval augmented generation).

onigetoc commented 1 year ago

May be with langchain plugin or not. I think it's already exist: https://js.langchain.com/docs/modules/data_connection/text_embedding/ but i didn't find for Fine-Tunning.

May be as text, files and jsonL / json line. I do not know if Openai only accept text and jsonL? i though to create something to convert any files to text and any json to jsonL but not really sure.

rgresock commented 8 months ago

Could the backend use a pip package to prepare the embeddings? I would vote for a local embedding model to keep the documents private and reduce costs. It might be a reliable and consistent alternative to asking the current model for the conversation title.

INSTRUCTOR (Instruction-based Omnifarious Representations) 👨‍🏫 "Embeddings tailored to any task" One Embedder, Any Task: Instruction-Finetuned Text Embeddings Also, this relates to File support: vector indexing & retrieval project item.

UPD: I just finished listening to the publication. I looked for models to find the "-large" model had ~8x more DLs last month but only 3x larger at 1.34 GB 194,913

hungvu946 commented 3 weeks ago

Is this feature available in version 0.7.4 yet?