Multilingual embeddings based on similarity of translated pairs

beekbin / bert-cosine-sim

Fine-tune BERT to generate sentence embedding for cosine similarity

70 stars 13 forks source link

Multilingual embeddings based on similarity of translated pairs #4

Open MastafaF opened 5 years ago

MastafaF commented 5 years ago

Hey,

How would you go on generating embeddings in a language agnostic way based on the multi-BERT model using "bert-base-multilingual-cased" instead of "bert-base-uncased" ?

A similar approach based on translated pairs instead of pairs of similar sentences in english could improve the embeddings from the current multilingual one. What is your view around that?

Nice library btw! :D

pommedeterresautee commented 4 years ago

To reword the question, do you freeze the BERT weights before finetuning? I imagine that if they are frozen / not updated then a language transfer is possible (fine tune in English, infer in another BERT supported language)