Closed JejuWayfarer closed 4 years ago
Yes, it is possible. Just replace the model name in that example with a multi lingual model.
Thanks for response. When additional learning in Korean for the multilingual model is conducted, is it done through knowledge distilation (like distil BERT)? Or is it done by updating the BERT weight like SBERT?
I have one more question. I am working on making Korean sentence embeddings. Is there a difference in performance between using 'xlm-r-base-en-ko-nli-ststb' and 'xlm-r-100langs-bert-base-nli-stsb-mean-tokens'? Or would it be best to create a Korean monolingual model?
Both were trained with the same method (multilingual knowledge Destillation). The XLM-R-en-ko works even better for Korean than a monolingual model (according to KorSTSb).
The XLM-R-100langs model should produce similar embeddings, as both imitate bert-base-nli-stsb. But as it was trained for more languages, I expect it to be slightly worse than the en-ko model.
Thanks, I understood the principle of learning
A little question remains. Is multilingual knowledge Distillation used when continue learning Korean data in a multilingual model? In order to use multilingual knowledge Distillation, is it not necessary to use Korean single data, but to use paired data containing Korean? Can I perform continue learning on the multilingual model even when there is only Korean NLI or STS data?
Is there any advantage of using a multilingual model even when dealing with only a single language?
Not sure if I understand your question.
There are different ways you can fine-tune models: Option 1: Train on English data. Then use multilingual knowledge distillation to make the model compatible with Korean. Then you have an EN-KO model with aligned vector spaces.
Option 2: Train only on Korean data (like KorNLI and KorSTS). In that case the model only works for Korean.
The performance for Option 1 and Option 2 is about the same (in my experiments). Advantage of option 1 is that your model also support English :)
If you continue training on some other data, you can use models either from Option 1 or Option 2 and tune-it further on task specific data.
Thanks again for the answer.
One more question is, is it possible to train data such as KorNLI in addition to 'xlm-r-base-en-ko'? In other words, is it possible to proceed with additional learning in one of the languages in the multilingual model?
I think it is possible to learn the weight of the BERT layer, but is it correct? In this case, is it a model that can only be used in Korean?
Yes, you can proceed the training only on e.g. Korean data.
The alignment over the time will then get worse, i.e., the Korean space will be slightly different than the English space. The longer you train (and the more data), the worse the misalignment should get. But I think you would still get a quite decent alignment as the vector space moves rather slowly when fine-tuning on new data.
Thanks very much for answer. Even if the alignment worsens by additionally learning Korean, if the model is applied only to the Korean data, not English, can it be considered that the performance will improve?
If you have suitable training data, that matches your desired task, performance will usually increase, yes.
Thank you very much for your long response. It was a great help.
In the case of the monolingual model, it is shown that continue learning is possible with STS-type data. Likewise, is there a way to continue learn with custom data in the 3 multilingual models disclosed? If possible, should I change it to NLI or STS data format and proceed? Is there anything I can refer to?