Living-with-machines / DeezyMatch

A Flexible Deep Learning Approach to Fuzzy String Matching
https://living-with-machines.github.io/DeezyMatch/
Other
134 stars 34 forks source link

HGL tutorial #128

Closed mcollardanuy closed 2 years ago

mcollardanuy commented 2 years ago

Hi @fedenanni @kasra-hosseini could you have a look at the Libyan gazetteer tutorial? The order is (1) prepare_dataset.ipynb and (2) tutorial_hgl_w2v.ipynb. Thanks!

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

kasra-hosseini commented 2 years ago

[Intro by Valeria] is missing.

kasra-hosseini commented 2 years ago

In prepare_dataset, it says:

# Specify the language codes of the country, for this example: Arabic, Libyan Arabic, Berber, Domari,
# Tamasheq, Teda, Egyptian Spoken Arabic, Standard Arabic, Awjila, Italian, French, English, and Libyan
# Spoken Arabic:
toponym_languages = ["ar", "ar-LY", "ber", "rmt", "taq", "tuq", "arz", "arb", "auj", "it", "fr", "en", "ayl"]

Where did we get these abbreviations? (e.g., "ar", "ber", ...)

mcollardanuy commented 2 years ago

In prepare_dataset, it says:

# Specify the language codes of the country, for this example: Arabic, Libyan Arabic, Berber, Domari,
# Tamasheq, Teda, Egyptian Spoken Arabic, Standard Arabic, Awjila, Italian, French, English, and Libyan
# Spoken Arabic:
toponym_languages = ["ar", "ar-LY", "ber", "rmt", "taq", "tuq", "arz", "arb", "auj", "it", "fr", "en", "ayl"]

Where did we get these abbreviations? (e.g., "ar", "ber", ...)

From geonames, mostly this file. Others are not listed ("ar-LY") but are fields in geonames, so this is a bit messy.

kasra-hosseini commented 2 years ago

I close this PR as we moved all the notebooks to a new repo: DeezyMatch_tutorials (private for now, soon to be public)