EticaAI / hxltm

HXLTM - Multilingual Terminology in Humanitarian Language Exchange.TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, (...)
https://hxltm.etica.ai
The Unlicense
0 stars 0 forks source link

Map (at least) Europe IATE termType to BCP-47 language attributes *AND* have documentation on it #13

Open fititnt opened 2 years ago

fititnt commented 2 years ago

If we manage to make a minimal viable product of #11, this means that content generated either by 3rd party software or complex interfaces not directly edited by HXLTM on spreadsheets are likely to use the non-wide format (as the one used on TICO-19 terminologies).

So, since we with #11 already it is necessary to pivot the formats, if we also document the language type as a specialized language attribute, this could make it easier later for users.

Also note that some use cases (not fault of HXLTM, but how the world is) someone could actually edit by hand the non-wide formats and then import on other systems (either ones optimized for HXLTM, or private companies who may use HXLTM documentation for closed-source terminology standards behind paywalls, like the new ones used by TBX 2019).

Note that people (even if years later) are likely to go for HXLTM not only as file format, but to have a crash course on how to deal with multilingual terminology.


Trivia:

  • one potential advantage of this is implementer (like private companies already trying to help the humanitarian sector) mostly have to adhere to full BCP47 (which is useful beyond HXLTM or humanitarian sector) that actually implement new code conventions.
  • Most extensions from BCP47, including Unicode -u-, are poorly documented outside Unicode.
    • This means we somewhat (also to avoid others creating new extensions namespaces) need to cite they do exist.
  • Most codes, like the ones used on Europe IATE (fullForm, abbreviation, shortForm, phrase, formula, variant) actually are only documented as part of some ISOs (which are behind paywals), which means whatever e create in Latin (which is optimized as public domain reference for translations) actually is relevant even for developers who could read the English/French version of such terminological standards, but don't have access.
    • Why people from global-south dedicate time to create ISOs that cannot be used by their own population (not just because language issues, but license issues to read specifications) is beyond my comprehension.