Closed fititnt closed 3 years ago
I think, in addition to the final XLIFF format, we're also draft one 'intermediary format, that is half the way between the HXL TM file convention and the XLIFF format.
This intermediate mostly parse input tm.hxl.csv (or anything that HXL tools are able to parse, like google spreadsheet, excel, etc) and rename columns that it knows that matters for the XLIFF format and prefix them with #x_xliff
Some corner cases, like lack of XLIFF support source of translations are not even ready for translation (something that may actually be very common for our use cases) we may prefix with #meta+xliff
cat _hxltm/schemam-un-htcds-5items.tm.hxl.tsv
#x_xliff+unit+id #meta+url #item+wikidata+code #meta+item+url+list #meta+lat_sortem #status #item+type+lat_dominium+list #item+type+lat_regnum #item+type+lat_divisionem #item+type+lat_classem #item+type+lat_ordinem #item+type+lat_familiam #item+type+lat_genus #item+type+lat_speciem #item+type+lat_segmentum #x_xliff+source+i_lat+is_latn #item+i_la+i_lat+is_latn+alt+list #meta+item+i_la+i_lat+is_latn #item+i_pt+i_por+is_latn #item+i_pt+i_por+is_latn+alt+list #meta+item+i_pt+i_por+is_latn #item+i_en+i_eng+is_latn #item+i_en+i_eng+is_latn+alt+list #meta+item+i_en+i_eng+is_latn #item+i_es+i_spa+is_latn #item+i_es+i_spa+is_latn+alt+list #meta+item+i_es+i_spa+is_latn #x_xliff+target+i_arb+is_arab #item+i_es+i_arb+is_arab+alt+list #meta+item+i_es+i_arb+is_arab #item+i_hi+i_hin+is_deva #item+i_hi+i_hin+is_deva+alt+list #meta+item+i_hi+i_hin+is_deva #item+i_sl+i_slv+is_latn #item+i_sl+i_slv+is_latn+alt+list #meta+item+i_sl+i_slv+is_latn
L10N_ego_summarius [(ℹ️)] Q1 https://github.com/HXL-CPLP/forum/issues/58|https://example.org 1 2 L10N L10N ego summarius Lingua Latina (Abecedarium Latinum) ∅ ∅ Língua portuguesa (alfabeto latino) ∅ ∅ English language (Latin script) ∅ ∅ Idioma español (Alfabeto latino) ∅∅ اللغة العربية ∅ يتطلب مراجعة بشرية. हिन्दी भाषा (देवनागरी लिपि) ∅ ∅ Slovenščina (Latinska abeceda) ∅ ∅
L10N_ego_codicem 2 2 L10N L10N ego codicem lat-Latn ∅ ∅ por-Latn ∅ ∅ eng-Latn ∅ ∅ spa-Latn ∅ ∅ arb-Arab ∅ ∅ hin-Deva ∅ ∅ slv-Latn ∅∅
L10N_ego_linguam_nomen 3 2 L10N L10N ego linguam nomen Lingua Latina ∅ ∅ Língua portuguesa ∅ ∅ English language ∅ ∅ Idioma español ∅ ∅ اللغة العربية ∅ يتطلب مراجعة بشرية. हिन्दी भाषा ∅ https://www.wikidata.org/wiki/Q1568 Slovenščina ∅ ∅
L10N_ego_scriptum_nomen [(ℹ️)] Q19845720 https://www.unicode.org/iso15924/ 4 2 L10N L10N ego scriptum nomen Abecedarium Latinum ∅ ∅ Alfabeto latino ∅ ∅ Latin script ∅ ∅ Alfabeto latino ∅ ∅ ∅ ∅ देवनागरी लिपि ∅ https://www.wikidata.org/wiki/Q38592 Latinska abeceda ∅ ∅
L10N_ego_patriam_UN_M49_numerum [(ℹ️)] Q7865431 https://en.wikipedia.org/wiki/UN_M49 5 2 L10N L10N ego patriam UN M49 numerum 001 ∅ ∅ 001 ∅ ∅ 001 ∅ ∅ 001 ∅ ∅ 001 ∅ ∅ 001 ∅ ∅ 001 ∅ ∅
./_systema/programma/hxltm2xliff.py _hxltm/schemam-un-htcds-5items.tm.hxl.csv --archivum-extensionem=.csv
#x_xliff+unit+id,#meta+url,#item+wikidata+code,#meta+item+url+list,#meta+lat_sortem,#status,#item+type+lat_dominium+list,#item+type+lat_regnum,#item+type+lat_divisionem,#item+type+lat_classem,#item+type+lat_ordinem,#item+type+lat_familiam,#item+type+lat_genus,#item+type+lat_speciem,#item+type+lat_segmentum,,,,,,,,,,,,#x_xliff+source+i_lat+is_latn,#item+i_la+i_lat+is_latn+alt+list,#meta+item+i_la+i_lat+is_latn,#item+i_pt+i_por+is_latn,#item+i_pt+i_por+is_latn+alt+list,#meta+item+i_pt+i_por+is_latn,#item+i_en+i_eng+is_latn,#item+i_en+i_eng+is_latn+alt+list,#meta+item+i_en+i_eng+is_latn,#item+i_es+i_spa+is_latn,#item+i_es+i_spa+is_latn+alt+list,#meta+item+i_es+i_spa+is_latn,#x_xliff+target+i_arb+is_arab,#item+i_es+i_arb+is_arab+alt+list,#meta+item+i_es+i_arb+is_arab,#item+i_hi+i_hin+is_deva,#item+i_hi+i_hin+is_deva+alt+list,#meta+item+i_hi+i_hin+is_deva,#item+i_sl+i_slv+is_latn,#item+i_sl+i_slv+is_latn+alt+list,#meta+item+i_sl+i_slv+is_latn
L10N_ego_summarius,[(ℹ️)],Q1,https://github.com/HXL-CPLP/forum/issues/58|https://example.org,1,2,L10N,L10N,ego,,,,,summarius,,,,,,,,,,,,,Lingua Latina (Abecedarium Latinum),∅,∅,Língua portuguesa (alfabeto latino),∅,∅,English language (Latin script),∅,∅,Idioma español (Alfabeto latino),∅,∅,اللغة العربية,∅,يتطلب مراجعة بشرية.,हिन्दी भाषा (देवनागरी लिपि),∅,∅,Slovenščina (Latinska abeceda),∅,∅
L10N_ego_codicem,,,,2,2,L10N,L10N,ego,,,,,codicem,,,,,,,,,,,,,lat-Latn,∅,∅,por-Latn,∅,∅,eng-Latn,∅,∅,spa-Latn,∅,∅,arb-Arab,∅,∅,hin-Deva,∅,∅,slv-Latn,∅,∅
L10N_ego_linguam_nomen,,,,3,2,L10N,L10N,ego,linguam,,,,nomen,,,,,,,,,,,,,Lingua Latina,∅,∅,Língua portuguesa,∅,∅,English language,∅,∅,Idioma español,∅,∅,اللغة العربية,∅,يتطلب مراجعة بشرية.,हिन्दी भाषा,∅,https://www.wikidata.org/wiki/Q1568,Slovenščina,∅,∅
L10N_ego_scriptum_nomen,[(ℹ️)],Q19845720,https://www.unicode.org/iso15924/,4,2,L10N,L10N,ego,scriptum,,,,nomen,,,,,,,,,,,,,Abecedarium Latinum,∅,∅,Alfabeto latino,∅,∅,Latin script,∅,∅,Alfabeto latino,∅,∅,,∅,∅,देवनागरी लिपि,∅,https://www.wikidata.org/wiki/Q38592,Latinska abeceda,∅,∅
L10N_ego_patriam_UN_M49_numerum,[(ℹ️)],Q7865431,https://en.wikipedia.org/wiki/UN_M49,5,2,L10N,L10N,ego,patriam,UN,M49,,numerum,,,,,,,,,,,,,001,∅,∅,001,∅,∅,001,∅,∅,001,∅,∅,001,∅,∅,001,∅,∅,001,∅,∅
What started as hxltm2xliff.py
for some months ago already is a user-configurable generator from the hxltmcli
(https://hdp.etica.ai/hxltm) program with options like hxltmcli --objectivum-XLIFF
, see https://hdp.etica.ai/hxltm/archivum/.
The HXLTM ASA EticaAI/HXL-Data-Science-file-formats#22 abstract in such way how to iterate with HXL with some conventioned extra tags that is possible to both import from XLIFF and export from HXL to XLIFF only by configuring an custom plugin. So the XLIFF, like TBX, TMX, XML, etc, uses an user-friendly syntax, the liquid https://shopify.github.io/liquid/ for templating, and extra attributes
The hxltmcli v0.8.7 (can be used as standalone or with Python package hdp-toolchain https://pypi.org/project/hdp-toolchain/) uses the cor.hxltm.yml and the hxltmdexml
(to convert back from any XML file used to export) based on this
#### XLIFF-obsoletum: XML Localization Interchange File Format (XLIFF) v2.1 __
# tag::normam_XLIFF[]
# @TODO: JLIFF (XLIFF on JSON) <https://github.com/oasis-tcs/xliff-omos-jliff>
XLIFF:
__meta:
archivum_extensionem: .xlf
situs_interretialis:
referens_officinale:
- <https://www.oasis-open.org/committees/xliff/>
vicipaedia:
- <https://en.wikipedia.org/wiki/XLIFF>
exemplum:
- <https://github.com/oasis-tcs/xliff-xliff-22>
- <https://github.com/oasis-tcs/xliff-xliff-22/blob/master/xliff-21/test-suite/core/valid/allExtensions.xlf>
- <https://github.com/oasis-tcs/xliff-xliff-22/blob/master/xliff-21/test-suite/core/valid/everything-core.xlf>
normam:
- <https://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html>
# - <https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/>
# @see <https://github.com/redhat-developer/vscode-xml/wiki/XMLValidation#XML-catalog-with-XSD>
# @see <https://github.com/redhat-developer/vscode-xml/issues/315>
- <https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/catalog.xml>
nomen:
eng-Latn: 'XML Localization Interchange File Format (XLIFF) v2.1'
asa:
modus_operandi:
# - multiplum_linguam
- bilingue
de_xml:
# This is a working draft
# @see https://terminator.readthedocs.io/en/latest/tbx_conformance.html
# ontologia libellam: I glossarium > II conceptum > III linguam > IV terminum
glossarium_radicem:
signum: xliff
# Exemplum I: <xliff version="1.2">
# Exemplum II: <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
glossarium_titulum: False
# II conceptum
conceptum_codicem:
signum: unit
de_attributum: id
trivium:
# de <xliff> ad <trans-unit>
- file
# III linguam
linguam_codicem: False # XLIFF-obsoletum est bilingue
linguam_fontem_codicem:
# Exemplum: 'pt' ad '<source xml:lang="pt">por-Latn</source>''
signum: source
de_attributum: lang
trivium: []
linguam_objectivum_codicem:
# Exemplum: 'es' ad '<target xml:lang="es">spa-Latn</target>''
signum: target
de_attributum: lang
trivium: []
# IV terminum
terminum_accuratum: False # XLIFF terminum habendum accuratum? Falsum
terminum_multum: False # XLIFF-obsoletum est bilingue
terminum_habendum_fontem: True
terminum_habendum_objectivum: True
terminum_fontem_valorem:
# Exemplum: 'por-Latn ad <source xml:lang="pt">por-Latn</source>
signum: source
# de_attributum: False
trivium: []
terminum_objectivum_valorem:
# Exemplum: 'spa-Latn' ad <target xml:lang="es">spa-Latn</target>
signum: target
# de_attributum: False
trivium: []
formatum:
# @see https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/catalog.xml
# @see https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/xliff_core_2.0.xsd
initiale: |2
<?xml version="1.0"?>
<xliff version="2.0"
xmlns="urn:oasis:names:tc:xliff:document:2.0"
xmlns:fs="urn:oasis:names:tc:xliff:fs:2.0"
xmlns:val="urn:oasis:names:tc:xliff:validation:2.0"
srcLang="{{ globum.fontem_linguam.bcp47 | default: 'la' }}"
trgLang="{{ globum.objectivum_linguam.bcp47 | default: 'ar' }}">
<file id="f1">
corporeum: |2
{% if rem.de_fontem_linguam -%}
<unit id="{{ conceptum.codicem | default: rem.de_nomen_breve.conceptum_codicem | default: 'errorem' | replace: '*', '' | replace: '+', '' | replace: '/', '' }}">
{% if rem.de_auxilium_linguam or rem.de_nomen_breve.referens_situs_interretialis.size > 0 -%}
<notes>
{%- for item in rem.de_auxilium_linguam -%}
<note appliesTo="source" priority="3"
category="de_auxilium_linguam">
_[{{- item.linguam -}}]
{{- item.rem -}}
[{{- item.linguam -}}]_
</note>
{%- endfor %}
{% for item in rem.de_nomen_breve.referens_situs_interretialis -%}
<note appliesTo="source" priority="1"
category="referens_situs_interretialis">
{{ item }}
</note>
{% endfor -%}
</notes>
{% else -%}
<!--
non rem.de_auxilium_linguam aut rem.de_nomen_breve.referens_situs_interretialis
-->
{% endif -%}
<segment state="{{ rem.de_objectivum_linguam.codicem_XLIFF | default: 'initial' }}">
<source>{{ rem.de_fontem_linguam.rem }}</source>
{%- if rem.de_objectivum_linguam and rem.de_objectivum_linguam.rem != '' %}
<target>{{ rem.de_objectivum_linguam.rem }}</target>
{%- else %}
<!-- non rem.de_objectivum_linguam -->
{%- endif %}
</segment>
</unit>
{%- else -%}
<!-- non rem.de_fontem_linguam -->
{%- endif %}
# <!-- {{ rem }} -->
finale: |2
</file>
</xliff>
The instructions above are for XLIFF 2, the XLIFF 1 is another option. While how to create other exporters/importers is not documented, using as starting point the close example than what is desired works best. One biggest difference is about either bilingual (like XLIFF and some common localization files) and multilingual (like TBX and TMX).
With future versions, the syntax may change a but HXL already is the best strategy to store multilingual content for who works with XLIFF. Most tools not even allow manage with more than one source language, so the HXLTM (as specialized tagging of HXL) actually now at least allow operate with translations from/to arbitrary number of source/target languages.
Test file started here https://github.com/HXL-CPLP/Auxilium-Humanitarium-API/blob/main/_systema/programma/hxltm2xliff.py.