Closed mcollardanuy closed 2 years ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
[Intro by Valeria] is missing.
In prepare_dataset
, it says:
# Specify the language codes of the country, for this example: Arabic, Libyan Arabic, Berber, Domari,
# Tamasheq, Teda, Egyptian Spoken Arabic, Standard Arabic, Awjila, Italian, French, English, and Libyan
# Spoken Arabic:
toponym_languages = ["ar", "ar-LY", "ber", "rmt", "taq", "tuq", "arz", "arb", "auj", "it", "fr", "en", "ayl"]
Where did we get these abbreviations? (e.g., "ar", "ber", ...)
In
prepare_dataset
, it says:# Specify the language codes of the country, for this example: Arabic, Libyan Arabic, Berber, Domari, # Tamasheq, Teda, Egyptian Spoken Arabic, Standard Arabic, Awjila, Italian, French, English, and Libyan # Spoken Arabic: toponym_languages = ["ar", "ar-LY", "ber", "rmt", "taq", "tuq", "arz", "arb", "auj", "it", "fr", "en", "ayl"]
Where did we get these abbreviations? (e.g., "ar", "ber", ...)
From geonames, mostly this file. Others are not listed ("ar-LY") but are fields in geonames, so this is a bit messy.
I close this PR as we moved all the notebooks to a new repo: DeezyMatch_tutorials
(private for now, soon to be public)
Hi @fedenanni @kasra-hosseini could you have a look at the Libyan gazetteer tutorial? The order is (1) prepare_dataset.ipynb and (2) tutorial_hgl_w2v.ipynb. Thanks!