FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
7.04k stars 514 forks source link

Multilingual Models #5

Open Siegi96 opened 1 year ago

Siegi96 commented 1 year ago

Do you plan to train and release multilingual embedding models in the near future?

staoxiao commented 1 year ago

The multilingual model is in progress, but we cannot confirm the timing of the release. Besides, which language is your need? We can consider adding it in the future.

Siegi96 commented 1 year ago

Thanks for your fast answer, good to hear that you are working on it. For me personally its english, spanish, german and french.

Keep up the awsome work, your models are totally impressive.

staoxiao commented 1 year ago

Thanks for your interest! We will constantly improve this project.

nhaouari commented 1 year ago

The multilingual model is in progress, but we cannot confirm the timing of the release. Besides, which language is your need? We can consider adding it in the future.

Thank you for your ongoing efforts in expanding the multilingual capabilities. Adding Arabic to your list of languages would not only serve a significant user base but would also greatly assist individuals like myself who frequently interact in the language. Your consideration of this request would be deeply appreciated.

freckletonj commented 1 year ago

"code" would be a useful language to add, especially common languages like python and javascript.

The GTE project claims this ability: https://huggingface.co/thenlper/gte-large

sinia commented 1 year ago

Please add Lithuanian language

sinia commented 1 year ago

Please add Lithuanian language

Also it would make sense to add Latvian in addition to Lithuanian language, as those two languages are closely related, should improve model's performance for both languages.

jingedawang commented 1 year ago

@staoxiao Would you support Japanese? Is there an expected release date?

staoxiao commented 1 year ago

@staoxiao Would you support Japanese? Is there an expected release date?

Yes. If there are no accidents, it will be released in about a month.

staoxiao commented 8 months ago

I apologize for the late release. We release a new model: BGE-M3 that supports multilingual, long text and multiple retrieval modes. Feel free to use it and provide feedback.

x4080 commented 8 months ago

@staoxiao What languages is BGE-M3 supported ? Is there a list somewhere ? Thanks