Open fititnt opened 2 years ago
Index | initial reference | Unicode notes | Comments |
---|---|---|---|
1 | | |
Zs Separator, space | Default to generic space, tabs, line breaks... All variants are Zs too |
2 | + |
1. Zs Separator, space | TODO: explain more |
3 | - |
1. Pd Punctuation, dash 2. Sm Symbol, math 3. (...) |
TODO: needs proof of concept |
4 | * | x |
1. P, Punctuation 2. Sm Symbol, math 3. (...) |
TODO: needs proof of concept |
5 | / | ÷ |
1. P, Punctuation 2. Sm Symbol, math 3. (...) |
TODO: needs proof of concept |
6 | = | = |
1. P, Punctuation 2. Sm Symbol, math 3. (...) |
TODO: needs proof of concept |
10 | ( | [ | { |
1. Ps Punctuation, open | All alternatives are granted to be Ps Punctuation, open |
11 | ) | ] | } |
1. Pe Punctuation, close | All alternatives are granted to be Pe Punctuation, close |
12 | _ |
1. Pc Punctuation, connector | TODO: needs proof of concept |
13 | \ |
(special) | TODO: need at least one scaping character to be reused without upgrade the mode |
19 | � |
Private use | Not assigned. |
56 | � |
Not assigned. | Not assigned. |
57 | � |
Not assigned. | Not assigned. |
58 | � |
Not assigned. | Not assigned. |
59 | � |
Not assigned. | Not assigned. |
.
'
(sometimes ` is used),
:
|
#item+conceptum+numerordinatio #item+rem+i_mul+is_zsym+ix_ndt60+ix_ndt60
0 �
1
2 +
3 -
4 /
5 =
6 �
7 �
8 �
9 �
10 (
11 )
12 _
13 \
14 �
15 �
16 �
17 �
18 �
19 �
20 0
21 1
22 2
23 3
24 4
25 5
26 6
27 7
28 8
29 9
30 a
31 b
32 c
33 d
34 e
35 f
36 g
37 h
38 i
39 j
40 k
41 l
42 m
43 n
44 o
45 p
46 q
47 r
48 s
49 t
50 u
51 v
52 w
53 x
54 y
55 z
56 �
57 �
58 �
59 �
Numerical codes not only are computationally efficient and easier for usage when defining large amounts of codes (such as internal divisions or organizations of their own country, but also used by modern such as Terminologia Anatomica) but also much more ideal for multilingual lexicography.
By "neutral codes" people sometimes think as if different regions hate other alphabets, but actually there are serious usability issues. For example using US-ASCII alpha (which, by the way, is not full Latin alphabet) no matter how hard an average person (not only) native speaker of any Arabic dialect, simply they can't pronounce all letters because several sounds are uncommon. Such a fact actually does happen inside languages which do use the Latin alphabet, to a point of usage coping mechanisms such as using the ICAO spelling alphabet to pronounce each letter. But in comparison, the sound of numbers quite often in most languages is very quite different.
Use cases of why makes sense even coordination of lexicography not have single coordination
TICO-19
One interesting fact we discovered empiracly on the lexicograpy of the [working-draft] Public domain datasets from Translation Initiative for COVID-19 on the format HXLTM (Multilingual Terminology in Humanitarian Language Exchange). (current link here https://github.com/EticaAI/tico-19-hxltm). I will focus on the wordlists (the TICO-19 "terminology" without concept description).
The final result has more errors on non-Latin scripts. This does't mean errors did not occurred on por-Latn, spa-Latn, ita-Latn, etc (but fun fact, several translations are better than the eng-Latn used as initial reference), and the more common issue was "literal translation". However, despite the "terminology/wordlists" even having professional translators, the issue on non-Latin writing systems was not perceived as quality control and is likely make easier to distribute last step review could improve this.
I could talk more on this topic, but to make an equivalent quality control would not require that lexicographers (people who compile result of others) actually know each language, but know at least one language and know the writing system. This is likely what already was the quality control on TICO-19 for languages in Latin script (likely some of they knew more than English, yet the work was more on translators and reviewer)
License issues
Slow response for humanitarian usage
This topic alone would take full discussion, but even for humanitarian usage, licences are problematic. Emergency initiatives on translations would require much less response time for authorization than lawyers of average organization copyright holder is able to respond.
Not practical to mention everyone collaboration on aggregated result
Also, there are several issues when compiling together work of different organizations. EVEN if it could be possible to know everyone ho helped, and they do donate for free, how do handle this? I will let one example from https://upload.wikimedia.org/wikipedia/commons/1/18/Arguments_on_CC0-licensing_for_data.pdf
How numeric codes can both help with international review from different regions and cope with licensing
While there are other use cases, some way to procedural generate such numbers can help at least with review (or even break work from different regions) and licensing.
In the worst case scenario, the terms on initial reference language can be removed immediately as soon as DMCA requests are done. This also copes with the fact that by default, if minimally creative work is done by volunteers (which by the way lexicographers using Numerordĭnātĭo would already have more context to explain concepts) could not be claimed by any initial implementation.
In practice, this could allow translations initiatives focused on humanitarian area start quickly, and still be welcoming give end work to be validated/reused by the organizations, here aiming general public benefit, however if the lawyers of such organizations try to troll, is up to external lexicography coordinators remove reference to the initial standards for what already is not fair use. An average consequence would mean removing the "copyrighted" source terms (often English and sometimes French) and release well curated versions of everything else in usable file formats friendly to use.
Note that in practice is unlikely such lawyers would go this far against translations for humanitarian use, and is more likely this be done by noob lawyers or "near automated" responses.