Multilingual model continue Training

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

https://www.sbert.net

Apache License 2.0

15.35k stars 2.48k forks source link

Multilingual model continue Training #365

Closed JejuWayfarer closed 4 years ago

JejuWayfarer commented 4 years ago

In the case of the monolingual model, it is shown that continue learning is possible with STS-type data. Likewise, is there a way to continue learn with custom data in the 3 multilingual models disclosed? If possible, should I change it to NLI or STS data format and proceed? Is there anything I can refer to?

nreimers commented 4 years ago

Yes, it is possible. Just replace the model name in that example with a multi lingual model.

JejuWayfarer commented 4 years ago

Thanks for response. When additional learning in Korean for the multilingual model is conducted, is it done through knowledge distilation (like distil BERT)? Or is it done by updating the BERT weight like SBERT?

I have one more question. I am working on making Korean sentence embeddings. Is there a difference in performance between using 'xlm-r-base-en-ko-nli-ststb' and 'xlm-r-100langs-bert-base-nli-stsb-mean-tokens'? Or would it be best to create a Korean monolingual model?

nreimers commented 4 years ago

Both were trained with the same method (multilingual knowledge Destillation). The XLM-R-en-ko works even better for Korean than a monolingual model (according to KorSTSb).

The XLM-R-100langs model should produce similar embeddings, as both imitate bert-base-nli-stsb. But as it was trained for more languages, I expect it to be slightly worse than the en-ko model.

JejuWayfarer commented 4 years ago

Thanks, I understood the principle of learning

A little question remains. Is multilingual knowledge Distillation used when continue learning Korean data in a multilingual model? In order to use multilingual knowledge Distillation, is it not necessary to use Korean single data, but to use paired data containing Korean? Can I perform continue learning on the multilingual model even when there is only Korean NLI or STS data?

Is there any advantage of using a multilingual model even when dealing with only a single language?

nreimers commented 4 years ago

Not sure if I understand your question.

There are different ways you can fine-tune models: Option 1: Train on English data. Then use multilingual knowledge distillation to make the model compatible with Korean. Then you have an EN-KO model with aligned vector spaces.

Option 2: Train only on Korean data (like KorNLI and KorSTS). In that case the model only works for Korean.

The performance for Option 1 and Option 2 is about the same (in my experiments). Advantage of option 1 is that your model also support English :)

If you continue training on some other data, you can use models either from Option 1 or Option 2 and tune-it further on task specific data.

JejuWayfarer commented 4 years ago

Thanks again for the answer.

One more question is, is it possible to train data such as KorNLI in addition to 'xlm-r-base-en-ko'? In other words, is it possible to proceed with additional learning in one of the languages in the multilingual model?

I think it is possible to learn the weight of the BERT layer, but is it correct? In this case, is it a model that can only be used in Korean?

nreimers commented 4 years ago

Yes, you can proceed the training only on e.g. Korean data.

The alignment over the time will then get worse, i.e., the Korean space will be slightly different than the English space. The longer you train (and the more data), the worse the misalignment should get. But I think you would still get a quite decent alignment as the vector space moves rather slowly when fine-tuning on new data.

JejuWayfarer commented 4 years ago

Thanks very much for answer. Even if the alignment worsens by additionally learning Korean, if the model is applied only to the Korean data, not English, can it be considered that the performance will improve?

nreimers commented 4 years ago

If you have suitable training data, that matches your desired task, performance will usually increase, yes.

JejuWayfarer commented 4 years ago

Thank you very much for your long response. It was a great help.