IntelLabs / fastRAG

Efficient Retrieval Augmentation and Generation Framework
Apache License 2.0
1.29k stars 116 forks source link

Multilingual RAG #3

Closed Matthieu-Tinycoaching closed 9 months ago

Matthieu-Tinycoaching commented 1 year ago

Hi,

Thanks for this great repo!

Is there any way to use this pipeline in multilingual settings?

Are there multilingual version of Colbert, PLAID and FiD? Else, how would you recommend to proceed?

peteriz commented 1 year ago

Hi @Matthieu-Tinycoaching, the models can be used for fine-tuning (or pretraining) on any language you desire, with the right process, having enough data, etc. ColBERT and FiD are based on pre-trained BERT models (English) so both are inter-changeable with other LMs.

Matthieu-Tinycoaching commented 1 year ago

Hi @peteriz could you give me the name of the datasets used to fine-tune FiD and Colbert?

peteriz commented 9 months ago

Natural Questions.