Closed raunaksinhacisco closed 1 year ago
Hi @raunaksinhacisco! The version of Chinese would depend on the training sets used, and unfortunately indeed the .zh
language code is underspecified. However, the more recent LASER3 encoders do have explicit support for both Simplified and Traditional Chinese! In order to embed using these specific models you can perform the following:
bash ./download_models.sh zho_Hans zho_Hant
. NOTE: "zho_Hans" and "zho_Hant" are the FLORES200 language codes for simplified and traditional Chinese respectively.Regarding Yue and Wu Chinese training sets, there are bitexts available from sources such as: https://opus.nlpl.eu/Tatoeba.php
Hope this helps!
There are three variants of Chinese that are listed a supported languages in LASER -
In platforms like Google usually Chinese is available in three forms -
Can someone please help me with the following questions -