UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
273 stars 248 forks source link

Document mapping between language names and codes #984

Open dan-zeman opened 1 year ago

dan-zeman commented 1 year ago

@Stormur said in another issue:

As an aside, I report the fact that sometimes it is quite difficult to make out or find the correspondence between ISO codes and languages in UD, so maybe an indexing by ISO codes, a list of correspondences, and the specification of the code also on the various pages pertaining to that language (and on the main page) would be very welcome.

I agree that it would be useful to have this somewhere on the website, ideally autogenerated from the database that underlies the UD infrastructure and updated at release time. It could list all languages currently known to the system, regardless whether they already have a treebank or a language-specific documentation page.

Note that if a treebank of the language appears on the home page (either as released in the past or as planned for the future), the language code used by UD can be seen when you click on the language name, then on the treebank name, then inspect the URL of the link "Treebank hub page"; for example, for the Cappadocian treebank (not yet released) the URL is https://universaldependencies.org/treebanks/cpg_amgic/index.html, meaning that the langauge code is cpg. If the language already has language-specific documentation, then it can be accessed from the guidelines page and again, its URL reveals the language code. Really all languages known to the system can be seen on the pages where features, relations and auxiliaries are registered for the validator (e.g. here for auxiliaries); language code is in each URL.

But this way is overcomplicated and it does not provide a good solution for the opposite search, from a code to a language name.