Closed QzzIsCoding closed 1 year ago
Both are used but I guess that enm-zho is very small and will not influence the model very much. For zho the models contain various language variants and they are trained with target language tokens. So, yes, you need to add a prefix to use the model. But I can also imagine that fine-tuning without would probably work where the model then learns to translate without the prefix token.
Hi, thanks for your model. I have two questions of the train datasets of opus-mt-en-zh.
https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/data/README-v2021-08-07.md
English - Chinese eng-zho 10390 | 43075 | 129323178
Middle English (1100-1500) - Chinese enm-zho
In this website, there are two datasets from en-zh. Which is the dataset of opus-mt-en-zh? When fine-tuning the model, does it need to add ">>cmn_Hans<< " before train_src?