EticaAI / hxltm

HXLTM - Multilingual Terminology in Humanitarian Language Exchange.TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, (...)
https://hxltm.etica.ai
The Unlicense
0 stars 0 forks source link

`hxltmcli` (or equivalent cli): MVP of custom ouput based on template + HXLTM data as cli option (without need of Ruby) #4

Open fititnt opened 2 years ago

fititnt commented 2 years ago

The current hxltmcli, as documented on https://hdp.etica.ai/hxltm/archivum/, allows several exporters of the complete dataset. But some of the "more basic" functionality of https://github.com/HXL-CPLP/Auxilium-Humanitarium-API / https://hapi.etica.ai/, were is possible to use the HXLTM as terms to create custom formats, still need to use Ruby code.

Either the Hapi or the hxltmcli (or a new cli) must be able to allow the extra fields (like the term definitions) to be used on templates.

fititnt commented 2 years ago

I'm not sure how many features of the existing ruby version at https://github.com/HXL-CPLP/Auxilium-Humanitarium-API will be implemented as built in --objectivum-formulam, like as in

hxltmcli schemam-un-htcds.tm.hxl.csv --objectivum-formulam formulam/exemplum-linguam.🗣️.json --objectivum-linguam por-Latn@pt > resultatum/formulam/exemplum-linguam.por-Latn.json

we will implement. But maybe will be just a subset. It may not have all features, but at least could be used sooner as part of Github Actions or other automations for simpler cases

fititnt commented 2 years ago

The issue on HXL-CPLP/forum is related to multilingual controlled vocabulary of data exchange related to vaccines deployment (special focus to COVID-19). While the https://github.com/HXL-CPLP/Auxilium-Humanitarium-API / https://hapi.etica.ai/ uses Ruby code to convert the current version of HXLTM to other outputs, that strategy still uses Ruby, and is harder to other consume shared HXLTM files (like as part of their data pipelines).

The idea here is use at least this specific implementation as test case for features need on the hxltmcli (including potential documentation on how to integrate with GitHub Actions, so people don't need to care about put too much things on their projects if they already are using GitHub).

fititnt commented 2 years ago

The current draft of HXL-CPLP-Dictionarium_Vaccinum (see https://github.com/HXL-CPLP/forum/issues/59) will require not only objective natural language, like --objectivum-linguam por-Latn, used to allow templating translations (so user don't need to mess up with complex Liquid forloops), but also some MVP (minimal viable product) of:

Reasoning

In addition to term level implementation, where terms (e.g. implementation of a language) can allow use templatings like {% _🗣️ medicinam_vaccinum 🗣️_ %} to vacina (via --objectivum-linguam por-Latn), with some way to expose to template what command line arguments define as source vs objetive convention (for example, trying to tell the program that the 'data convention source is Brazilian Gov, and the objetive standard is the convention used to US-CA ad hoc') this would make much simpler convert what are in columns like these

Captura de tela de 2021-10-14 06-42-35

To what templated results would output.

Data conventions associated with vocabularies at concept level

https://terminator.readthedocs.io/en/latest/_images/TBX_termEntry_structure.png Source: https://terminator.readthedocs.io/en/latest/_images/TBX_termEntry_structure.png

In my opinion, even if in lack of some global data standard to share vaccines data, we could consider that such metadata for is at concept level (not even language level), because is reusable at world level. This also means that if need to allow some control level of who can edit what, translators (unless also skilled at data standards) are less likely to touch this part of code (and people how know about data and software could be more free to change/customize standars from several regions).