EticaAI / tico-19-hxltm

[working-draft] Public domain datasets from Translation Initiative for COVID-19 on the format HXLTM (Multilingual Terminology in Humanitarian Language Exchange)
https://tico-19-hxltm.etica.ai
Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

Table with with TICO-19 language codes _as it is_, and the ones to be used on HXLTM (and exported formats) #2

Open fititnt opened 2 years ago

fititnt commented 2 years ago

The current scripts/data-info already do have some language codes

ls -h scripts/data-info/
tico19_t_facebook_initial+hotfixes-languages.csv
tico19_t_facebook_initial+hotfixes-language-source.csv
tico19_t_facebook_initial+hotfixes-language-target.csv
tico19_t_google_initial-languages.csv
tico19_t_google_initial-language-source.csv
tico19_t_google_initial-language-target.csv
tico19_tm+t_twb+google+facebook_initial-languages.csv
tico19_tm_twb_initial-language-pairs_source-lang-en.csv
tico19_tm_twb_initial-languages.csv

But they are not properly organized and also some changes (like remove what seems to be country code 'XX) were made using just scripts. The point here is have one or more summary tables, that could be embed on he documentation, about the changes we made.