EticaAI / hxltm

HXLTM - Multilingual Terminology in Humanitarian Language Exchange.TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, (...)
https://hxltm.etica.ai
The Unlicense
0 stars 0 forks source link

Change the way the HXLTM ontologia files is described/referenced to suggest possibility of verbs in other writing systems #7

Open fititnt opened 2 years ago

fititnt commented 2 years ago

Currently, the HXLTM ontology file uses Latin in Latin script for the reference implementation while part of the documentation is itself in English. The point of this issue is consideration to at least rename the file used by the ontology file in such a way that eventually could exist versions for the verbs itself in any other writing system. I understand that as 2021, most people tend to tolerate write in some Latin script when developing programs, but by at least intentionally make the writting system part of the ontology file naming, this could at least not lock such type of thinking.

For short explanation on this issue, some tags.


# @ARCHIVUM       cor.hxltm.yml
# (...)
__ontologia_cor_versionem__:  v0.8.6+EticaAI+voluntārium-commūne

fontem_archivum_extensionem:
  .tm.hxl.csv: HXLTM
  .xliff.hxl.csv: CSV-HXL-XLIFF
  # (...)

normam:
  Ad-Hoc:
    __meta:
 # (...)
  CSV-3:
    __meta:
      # archivum_extensionem: .csv
      archivum:
        extensionem: .csv
      descriptionem: |
        ...
 # (...)

ontologia_aliud:
  accuratum:
    "?":
      # The '?' express what to do when the entire column does not exist, so
      # is not a particular value that is missing
      _IATE_valorem_codicem: "★★"
      _IATE_valorem_descriptionem: |
        Automatically assigned to terms entered or updated by native speakers.
      _IATE_valorem_nomen: "Minimum reliability"
      _IATE_valorem_numerum: 6

# (...)

  genus_grammaticum:
    lat_commune:
      _aliud: 'TBX_other'
      # _codicem: lat_commune
      _codicem_TBX: TBX_other
      _descriptionem: 
      codicem_lat: commune
# (...)

  partem_orationis:
    lat_adverbium:
      _aliud: 'TBX_adverb|UTX_adverb'
      _codicem: lat_adverbium
      _codicem_TBX: TBX_adverb
      _codicem_UTX: UTX_adverb
      _codicem_wikidata: Q380057 # https://www.wikidata.org/wiki/Q380057
      _normam: https://la.wikipedia.org/wiki/Adverbium
      codicem_lat: adverbium
# (...)

Note that this is different from "documentation translation". Both documentation and even file paths, new data standards to be added by users on the current ontology file already allow full Unicode support. The main point here is at least make as part of the ontology file name the writing system of the verbs.

How to make even ontology file tolerate different verbs for writing systems

One requirement would be what each ontology verb means between writing systems. Since they are limited, even without adding hardcoded support to the reference implementations, someone could replace the verbs from/to new languages. If at some point do exist interested people who use non-Latin script, such mapping (may be done by external tool) could be used when converting from one region to another.

Note that a good part of these verbs also are part of the command line arguments. So if such mappings are well documented, this makes it possible to at least our reference implementation be used by other regions. The opposite could be true.

Anyway, one potential advantage of allowing this is if for some reason there exists a baseline community in other regions (for example, speakers of Arabic dialects, or Hindi, etc) they could be free to have some differences without wait by Etica.AI merge them.

Example

On practice this means that terms like normam (https://en.wiktionary.org/wiki/norma#Latin), ontologia_aliud (https://la.wikipedia.org/wiki/Ontologia, https://en.wiktionary.org/wiki/alius#Latin), partem_orationis (https://en.wiktionary.org/wiki/pars_orationis#Latin) would need to be explained the relations from other scripts (aka "translated") (but terms like Ad-Hoc, CSV-3 actually would be the same on any ontologia, since they are content.

If some mappings are important enough (for example, the specifications related to Ad-Hoc or HXLTM-ASA (whch in other languages could be something different) since there is much less writing systems than languages, such aliases could be part each ontologia.

But anyway, the point here is shows that even the verbs of the ontologia itself are in Latin and this topic may be remembered much time in the future, are not hardcoded in Latin script.