Open fititnt opened 2 years ago
#### HXLTM ad hoc templated export _______________________________________
- name: ".github/hxltm/hxltmcli.py --objectivum-formulam data/README.🗣️.md --objectivum-linguam hin-Deva@hi data/exemplum/hxltm-exemplum-linguam.tm.hxl.csv data/README.hin-Deva.md"
uses: fititnt/hxltm-action@main
continue-on-error: true
with:
bin: ".github/hxltm/hxltmcli.py"
args: |
--objectivum-formulam data/README.🗣️.md
--objectivum-linguam hin-Deva@hi
infile: data/exemplum/hxltm-exemplum-linguam.tm.hxl.csv
outfile: data/README.hin-Deva.md
its the example dataset. omiting here
data/README.🗣️.md
\`\`\`json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "Testum",
"type": "object",
"properties": {
"L10N_ego_summarius": {
"description": "{% _🗣️ L10N_ego_summarius 🗣️_ %}",
"type": "string",
"example": ""
},
"L10N_ego_codicem": {
"description": "{% _🗣️ L10N_ego_codicem 🗣️_ %}",
"type": "string",
"example": ""
},
"L10N_ego_linguam_nomen": {
"description": "{% _🗣️ L10N_ego_linguam_nomen 🗣️_ %}",
"type": "string",
"example": ""
},
"L10N_ego_scriptum_nomen": {
"description": "{% _🗣️ L10N_ego_scriptum_nomen 🗣️_ %}",
"type": "string",
"example": ""
},
"L10N_ego_patriam_UN_M49_numerum": {
"description": "{% _🗣️ L10N_ego_patriam_UN_M49_numerum 🗣️_ %}",
"type": "string",
"example": ""
},
"L10N_ego_patriam_UN_P_codicem": {
"description": "{% _🗣️ L10N_ego_patriam_UN_P_codicem 🗣️_ %}",
"type": "string",
"example": ""
},
"I18N_testum_salve_mundi_testum_I18N": {
"description": "{% _🗣️ I18N_testum_salve_mundi_testum_I18N 🗣️_ %}",
"type": "string",
"example": ""
},
"I18N_إختبار_טעסט_测试_테스트_испытание_I18N": {
"description": "{% _🗣️ I18N_إختبار_טעסט_测试_테스트_испытание_I18N 🗣️_ %}",
"type": "string",
"example": ""
},
"I18N_०१२३४५६७८९_〇一二三四五六七八九十百千万亿_-1+2/3*4_٩٨٧٦٥٤٣٢١٠_零壹贰叁肆伍陆柒捌玖拾佰仟萬億_I18N": {
"//description": " _🗣️ I18N_०१२३४५६७८९_〇一二三四五六七八九十百千万亿_-1+2/3*4_٩٨٧٦٥٤٣٢١٠_零壹贰叁肆伍陆柒捌玖拾佰仟萬億_I18N 🗣️_ ",
"//comment": "jg-rp/liquid complaints about: + - * /",
"description": "{% _🗣️ I18N_०१२३४५६७८९_〇一二三四五六七八九十百千万亿_1234_٩٨٧٦٥٤٣٢١٠_零壹贰叁肆伍陆柒捌玖拾佰仟萬億_I18N 🗣️_ %}",
"type": "string",
"example": "",
"//test2": "{% _🗣️ 👁️lat-Latn👁️ 👂Dominium publicum👂 👁️lat-Latn👁️ 🗣️_ %}"
}
}
}
\`\`\`
data/README.hin-Deva.md
\`\`\`json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "Testum",
"type": "object",
"properties": {
"L10N_ego_summarius": {
"description": "हिन्दी भाषा (देवनागरी लिपि)",
"type": "string",
"example": ""
},
"L10N_ego_codicem": {
"description": "hin-Deva",
"type": "string",
"example": ""
},
"L10N_ego_linguam_nomen": {
"description": "हिन्दी भाषा",
"type": "string",
"example": ""
},
"L10N_ego_scriptum_nomen": {
"description": "देवनागरी लिपि",
"type": "string",
"example": ""
},
"L10N_ego_patriam_UN_M49_numerum": {
"description": "001",
"type": "string",
"example": ""
},
"L10N_ego_patriam_UN_P_codicem": {
"description": "∅",
"type": "string",
"example": ""
},
"I18N_testum_salve_mundi_testum_I18N": {
"description": "नमस्ते दुनिया",
"type": "string",
"example": ""
},
"I18N_إختبار_טעסט_测试_테스트_испытание_I18N": {
"description": "परीक्षा, १, २, ३",
"type": "string",
"example": ""
},
"I18N_०१२३४५६७८९_〇一二三四五六七八九十百千万亿_-1+2/3*4_٩٨٧٦٥٤٣٢١٠_零壹贰叁肆伍陆柒捌玖拾佰仟萬億_I18N": {
"//description": " _ I18N_०१२३४५६७८९_〇一二三四五六七八九十百千万亿_-1+2/3*4_٩٨٧٦٥٤٣٢١٠_零壹贰叁肆伍陆柒捌玖拾佰仟萬億_I18N ",
"//comment": "jg-rp/liquid complaints about: + - * /",
"description": "!!!I18N_०१२३४५६७८९_〇一二三四五六七八九十百千万亿_1234_٩٨٧٦٥٤٣٢١٠_零壹贰叁肆伍陆柒捌玖拾佰仟萬億_I18N!!!",
"type": "string",
"example": "",
"//test2": "!!!👁️lat-Latn👁️ 👂Dominium publicum👂 👁️lat-Latn👁️!!!"
}
}
}
\`\`\`
One not so nice issue optimization (that is not as clear for small number of languages, but the idea is scale up) is that without some syntatic suguar, each language generate requires a different hxltmcli
command.
This means that for 6 languages (sorry, forgot russian and french on test dataset, so would be at least 8) requires 6 code repetitions where only thing that changes is the language code
Well, it's already good, just not as great, but it works.
Okay. I think one approach here would be start to prepare the hxltmcli to support this type of syntax
.github/hxltm/hxltmcli.py \
--objectivum-formulam data/README.🗣️.md \
--objectivum-linguam por-Latn@pt \
data/exemplum/hxltm-exemplum-linguam.tm.hxl.csv \
data/README.{[iso6393]}-{[iso115924]}.md
... in such way that when it detect that there is a specific objective language, (in HXLTM slang, HXLTMLinguam) like por-Latn@pt) the data/README.{[iso6393]}-{[iso115924]}.md
would be replaced by data/README.por-Latn.md
.
This step reduces a bit of redundant code (and also simplifies the hxltm-action, wich uses the underlining python cli tooling).
Some points
hxltmcli
/ hxltmdexml
individual stepFor short term, the hxltm-action, at least the entrypoint.sh
could do some sort of looping. this also could help to detect error for individual languages, without break everything else (and, simplifies a bit the python cli tooling).
To make hxltmcli / hxltmdexml poweful to make multiple operations, would be better follow the idea of what HXLStandard cli tooling call "JSON specs" (see https://github.com/HXLStandard/libhxl-python/wiki/JSON-specs).
hxltm-action
may require implement the concept of "working languages" and "auxiliary language"The underlining cli tooling uses latin, as in --agendum-linguam
as working language and --auxilium-linguam
for both concepts. The first one is somewhat related to all possible languages that, if exist on HXLTM source reference, will be used. The second one, while not fully implemented, tells with fallback language to use when there is no translation available.
hxltmcli --help
# hxltmcli v0.8.7
# (...)
--agendum-linguam agendum_linguam, -AL agendum_linguam
(Planned, but not fully implemented yet) Restrict working
languages to a list. Useful for HXLTM to HXLTM or multilingual
formats like TBX and TMX. Requires: multilingual operation.
Accepts multiple values.
# (...)
--auxilium-linguam auxilium_linguam, -AUXL auxilium_linguam
(Planned, but not implemented yet) Define auxiliary language.
Requires: bilingual operation (and file format allow
metadata). Default: Esperanto and Interlingua Accepts multiple
values.
# (...)
--objectivum-formulam OBJECTIVUM_FORMULAM
Template file to use as reference to generate an output. Less
powerful than custom file but can be used for simple cases.
If the next step is implemented, the --auxilium-linguam
(which I think already works for define several options to fall back) the HXLTM ontologia eventually will know how near languages are.
This means an user may use --auxilium-linguam to hardcode english as fallback language, when do exist (this case is very relevant for macro languages) a close language. It should have some way if the ontologia AND the dataset (which would likely to be controlled by volunteers, maybe even language regulators) make harder (or even require extra parameter) to break the operation. I mean, the --auxilium-linguam
can work on short term, but the ideal is project even developer who is assembling the results to not make mistakes for languages he don't know.
Links
https://hdp.etica.ai/hxltm/archivum/#HXLTM-ad-hoc
The HXLTM have an experimental feature implemented (but not fully documented) called now HXLTM Ad Hoc Fōrmulam (HXLTM templated export) and with output from
hxltmcli --help
The idea of this topic here (maybe also with an example for #2) is use hxltm-action to showcase this feature. Maybe one perfect example is use this to store translations from README.md files on seperate place, then automatically generate the readmes.
This, with combo of fetch translations from remote sources (like Google Sheets) could allow create translations for projects (even as simple as README files).