cldf-datasets / uratyp

Creative Commons Attribution 4.0 International
5 stars 4 forks source link

Match dialect-level Glottolog languoids #2

Closed xrotwang closed 3 years ago

xrotwang commented 3 years ago

I updated the Glottocodes to be as specific as possible (given the current Glottolog 4.4). Quite a few varieties can be linked to Glottolog dialects rather than languages. I think this specificity is desirable, because

xrotwang commented 3 years ago

Note that we might still add the language-level match (for a specific Glottolog version) when compiling the CLDF data. So in terms user-friendliness, nothing would be lost.

JakeJing commented 3 years ago

I updated the Glottocodes to be as specific as possible (given the current Glottolog 4.4). Quite a few varieties can be linked to Glottolog dialects rather than languages. I think this specificity is desirable, because

  • we also often have dialect-level data in the Geo Database
  • comparability with UraLex might require it. (E.g. in UraLex Selkup is linked to the northern dialects tazz1244, while in UraTyp, "South Selkup" - i.e. the more specific name - was linked to selk1253 - the more generic Glottocode)

I also find that the distinction between language and dialect is not very consistent across different databases in Uralic. The linguistic area maps from Geo data provides several levels of data (dialects, languages and branches), but I still need to collapse some polygons to make it suitable for UraTyp data. The UraLex data may have a much more specific name for certain dialects where the data originally come from. But for UraTyp, the differences between dialects are less prominent, so that most glottolog codes are at the language level. Do you want some kind of mapping table between different datasets, or simply add a column of glottolog codes that are consistent across different databases?