EticaAI / hxltm

HXLTM - Multilingual Terminology in Humanitarian Language Exchange.TBX, TMX, XLIFF, UTX, XML, CSV, Excel XLSX, Google Sheets, (...)
https://hxltm.etica.ai
The Unlicense
0 stars 0 forks source link

Reorganization of online documentation for v1.0.0 of HXLTM #8

Open fititnt opened 2 years ago

fititnt commented 2 years ago

Quick links (for the moment of this issue creation):

Documentation is quite important. Most tools (in special the ones related to deal with liguistic data) are poorly documented, and add to this that people sometimes use wrong codes to express languages, so the amount of validation when people import from other tools to HXLTM would be huge.

While we do moved somewhat the documentation to the ontology YAML, averange end user may still like HTML web page to get started. So for the v1.0.0, let's re-organize the documentation.

1. Potential strong changes

1.1. documentation-site.tld

The entry page of documentation sites become a simpler link to other translations. At the moment, we have only English, so the home page would have link only for it.

1.1.1. Question: why not have "versioned" pages? As documentation-site.tld/vN.Z

The combo of translations plus versions on URL alone would be hard to cope. Also, as new translations come, old versions would have new strings. Add to this that the idea of have to plan for formal releases is stressing.

At this moment, I believe that take as inspiration the called HTML living standard (https://html.spec.whatwg.org/) and have some discipline could be good enough. First create documentation takes a lot of time (so the idea of keep making changes without reason are already sufficient avoid too much changes). Also, we could intentionally make all the reference software cope with changes, so we could at least document some way that if someone is leaving some automated parsing use HXLTM (like the HXLTM github action) and want to keep it work for years, we could try plan ahead how the person could freeze the versions.

1.1.2. What about have "real versioned releases"?

This is one of the reasons the pypi package also have the suffix -eticaai (even if is free at the moment hxltm alone). maybe we will reserve this just to avoid random person, but the idea would be intentionally have namespace.

If necessary, (which could be, in case of usage for big organizations), one ideal approach would be some sort of official release by them. Since 101% not like the ISO organization, any group (if not @HXLStandard itself) if not also a documentation page, could at least have frozen versions.

1.2 documentation-site.tld/zzz-Zzzz

The current draft already use as home page the https://hxltm.etica.ai/eng-Latn/. But while the "introduction" page could be the one with smaller URL prefix, this would have problems:

fititnt commented 2 years ago

Humm...asciidoctor (the format we're using) also allows exporting to ebook-like formats, including PDF. This could be somewhat explored on the main end user page.

A somewhat advantage of this approach is that it allows full archival of versions. But at same time, not only correct the broken links, but the way the titles are organized should consider somewhat what would be if it was a book, without need to access other URLs.

fititnt commented 2 years ago

The PDF generation with asciidoctor is so beautiful 😍.

Lot's of small organization, but is something

Captura de tela de 2021-11-16 22-49-18

bug: default font do not include some tested characters

Since we intentionally used test cases with fonts from several regions (it sill not have full range, but do have terms from several ones) we're already able to find that for example <termEntry id="I18N_०१२३४५६७८९_〇一二三四五六七八九十百千万亿_-1234_٩٨٧٦٥٤٣٢١٠_零壹贰叁肆伍陆柒捌玖拾佰仟萬億_I18N"> does not have fonts, so the image shows with error.

One know limitation is that very likely the final PDF would be much larger if we include more fonts. This will need some extra optimization later, but I still think extra size would worth the effort if we could not optimize to only include the used characters,