Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.
https://anythingllm.com
MIT License
19.08k stars 2.09k forks source link

[FEAT]: Multilingual Native Embedder #658

Open timothycarambat opened 6 months ago

timothycarambat commented 6 months ago

What would you like to see?

Currently, the built-in embedder uses the ONNX all-MiniLM-L6-v2 embedder, which does okay for most use cases and is much smaller to download.

There should be support for the larger multilingual-e5-large model (ONNX HERE) for multi-lingual support.

This should not be the default, but it should be something the user can opt to select. They may have to wait for the download to completely download for the embedder change to be saved as we cannot afford the latency to download the model at runtime.

timothycarambat commented 6 months ago

Also, we don't want to pre-pack the docker image with models people may not use, so we will not be doing that in the future to keep the docker image portable enough for a reasonable size.

vlbosch commented 5 months ago

I would also like the option to add another local embeddings model, like for example BGE-M3. I tried adding it in the models-folder myself, but couldn't get it to work yet, unfortunately. Hopefully this feature can be added on the short term, so that we don't need to really on OpenAI's models for multilingual documents. Thanks in advance! :-)

oscar-7000 commented 5 months ago

bge-m3 would be nice

sweco-nlmdek commented 3 months ago

This would be a very welcome feature. i see in this thread : https://github.com/Mintplex-Labs/anything-llm/issues/645 someone tried multilingual-e5-large and it seems to help allot.