Living-with-machines / DeezyMatch

A Flexible Deep Learning Approach to Fuzzy String Matching
https://living-with-machines.github.io/DeezyMatch/
Other
139 stars 34 forks source link

Add Heritage Gazetteer of Libya tutorial for DH2022 #125

Open mcollardanuy opened 2 years ago

mcollardanuy commented 2 years ago

Prepare tutorial on using DeezyMatch with the Heritage Gazetteer of Libya: https://dh2022.adho.org/workshops-and-tutorials/wt-13

We will show how to create a DeezyMatch models that are trained on Arabic name variations and on Latin name variations of places in modern-day Libya, which will enable us to find the best entry in a gazetteer, for a given query. This model could be used to consolidate data about names of heritage locations in Arabic speaking countries, like in the Heritage Gazetteer of Libya. Currently, the high level of spelling variation in Arabic placenames (across time and transcriptions) makes it difficult to consolidate data that lies in different archives and collections, which at the moment rely on perfect string matching to find connections. We will show how DeezyMatch can be used to more easily associate an heritage location to a number of variant names, thus improving accuracy of data and metadata, and facilitating alignement with other knowledge bases such as Wikidata or Geonames.