eric-muller / udhr

Universal Declaration of Human Rights
6 stars 4 forks source link

Adlam text contains Latin letter eng? #50

Open simoncozens opened 1 year ago

simoncozens commented 1 year ago

The Adlam text udhr_fuf_adlm contains the Latin letter eng in quite a few articles (e.g. "𞤼𞤫𞥅ŋ𞤼𞤭" in article 11.2). Is this an OCR error or similar?

simoncozens commented 1 year ago

The corresponding word in Boubacar Diallo's translation looks like this:

Screenshot 2022-10-05 at 12 21 29

I think it should be a U+1E91B?

r12a commented 1 year ago

U+1E91B is a capital letter. Perhaps you meant to say U+1E93D. (Which is btw pronounced /Å‹/, so it may not even be an OCR error. Clearly wrong though.)

https://r12a.github.io/scripts/adlam/block.html#char1E91B https://r12a.github.io/scripts/adlam/block.html#char1E93D

eric-muller commented 1 year ago

There is an occurrence of U+01AD LATIN SMALL LETTER T WITH HOOK that probably does not belong here.

There are 55 occurrences of U+060C ARABIC COMMA and 1 occurrence of U+061B ARABIC SEMICOLON. Yet, TUS states 'Adlam uses European punctuation and the U+061F ARABIC QUESTION MARK.' There are also 5 occurrences of U+2E41 REVERSED COMMA.

There are 10 occurrences of U+2019 RIGHT SINGLE QUOTATION MARK and 122 occurrences of U+0027 APOSTROPHE. There are no occurrences of U+1E94B ADLAM NASALIZATION MARK.

r12a commented 1 year ago

These substitutions probably indicate the age of the text as they probably reflect font configurations that act like some kind of digital tree rings.

My understanding is that Adlam uses the reversed comma, and the arabic comma is a carryover from earlier Adlam fonts (https://r12a.github.io/scripts/adlam/fuf.html#phrase). The apostrophes and probably the right single quotation marks (i count 10) probably all should be nasalisation marks (https://r12a.github.io/scripts/adlam/fuf.html#nasalisation).