apertium / apertium-uzb

Apertium linguistic data for Uzbek
GNU General Public License v3.0
6 stars 12 forks source link

Some ʻokinas are wrong #13

Open ftyers opened 3 years ago

ftyers commented 3 years ago

Enclosed please find list of entries I consider incorrect: in Uzbek orthography, only two letters can be followed by the "inverted comma" diacritic (i.e., "o" and "g"). Will you please amend the lexicon.

aʻyon:aʻyon N1 ; ! 
aʻzam:aʻzam A2 ; ! 
aʻzo:aʻzo N1 ; ! 
aʻzo:aʻzo N1 ; ! "member"
aʻzolik:aʻzolik N1 ;  ! 
aʻlam:aʻlam N1 ; ! 
aʻlo:aʻlo A1 ; ! 
aʻlochi:aʻlochi N1 ; ! 
aʻrof:aʻrof N1 ; ! 
bʻol:bʻol VERB-IV ; ! "to be"
baʻdaz:baʻdaz ADV ; ! 
baʻd:baʻd ADV ; ! 
baʻzan:baʻzan ADV ; ! "sometimes"
baʻzida:baʻzida ADV ; ! "sometimes"
baʻzi:baʻzi PRON-IND ; !  "sometimes"
baʻzibir:baʻzibir PRON-IND ; ! 
baʻzida:baʻzida PRON-IND ; ! 
badfeʻl:badfeʻl A2 ; ! 
badfeʻllik:badfeʻllik N1 ;  ! 
bilʻaks:bilʻaks CA ; ! "to the contrary"
daʻvat:daʻvat N1 ; ! "invitation"
daʻvoy:daʻvoy N1 ; ! "case/trial/issue"
dafʻa:dafʻa N1 ; ! "time" as in 3 times
dafʻa:dafʻa N1 ; ! "times"
qʻavgʻo:qʻavgʻo N1 ; !
maʻlum:maʻlum A1 ; ! "certain"
maʻlumot:maʻlumot N1 ; ! "information"
maʻmuriy:maʻmuriy A1 ; ! "administrative"
maʻmuriyat:maʻmuriyat N1 ; ! "administration"
maʻnoli:maʻnoli A2 ; ! "meaningful"
maʻno:maʻno N1 ; ! "meaning"
maʻrifat:maʻrifat N1 ; ! "education"
masʻuliyat:masʻuliyat N1 ; ! "responsibility"
meʻmoriy:meʻmoriy A1 ; ! "architectural"
otliq:otliq A1 ; ! "mounted (on horseʻs back)"
taʻqib:taʻqib N1 ; ! ! "pursuit/following"
taʻlimot:taʻlimot N1 ; ! "doctrine, instructions"
taʻlim:taʻlim N1 ; ! "training"
taʻminlan VERB-IV ; ! "to be supplied"
taʻminla:taʻminla VERB-TV ; ! "to provide"
taʻminla:taʻminla VERB-TV ; ! "to provide with"
taʻminla:taʻminla VERB-TV ; ! "to supply"
taʻminlash:taʻminlash N1 ; ! "assurance"
taʻsir:taʻsir N1 ; ! "influence"
taʻsir:taʻsir N1 ; ! "effect"
shaʻbon:shaʻbon N1 ; ! "month"
shaʻn:shaʻn N1 ; ! "honor"
eʻlon:eʻlon N1 ; ! "annoncement"
eʻtibor:eʻtibor N1 ; ! "effect, prestige"
eʻtimod:eʻtimod N1 ; ! "faith"
eʻtirof:eʻtirof N1 ; ! "confession"
yaʻni:yaʻni CA ; ! "in other words"
Boʻgʻoz:Bo%ʻgʻoz NP-TOP ; ! "strict"
ftyers commented 3 years ago

Should [ae]ʻ be [ae]’ ?

vladob54 commented 3 years ago

Here are the respective paragraphs from Wikipedia dealing with the subject:


When the Uzbek language is written using the Latin script, the letters Oʻ (Cyrillic Ў) and Gʻ (Cyrillic Ғ) are properly rendered using the character U+02BB ʻ MODIFIER LETTER TURNED COMMA[9], which is also known as the ʻokina. However, since this character is absent from most keyboard layouts (except for the Hawaiian keyboard in Windows 8, or above, computers) and many fonts, most Uzbek websites – including some operated by the Uzbek government[2] – use either U+2018 ‘ LEFT SINGLE QUOTATION MARK or straight (typewriter) single quotes to represent these letters.

The modifier letter apostrophe (ʼ) (tutuq belgisi) is used to mark the phonetic glottal stop when it is put immediately before a vowel in borrowed words, as in sanʼat (art). The modifier letter apostrophe is also used to mark a long vowel when placed immediately after a vowel, as in maʼno (meaning).[10] Since this character is also absent from most keyboard layouts, many Uzbek websites use U+2019 ’ RIGHT SINGLE QUOTATION MARK instead.

Currently most typists do not bother with the differentiation between the modifier letter turned comma and modifier letter apostrophe as their keyboard layouts likely accommodate only the straight apostrophe.


Thus., U+02BB ʻ (MODIFIER LETTER TURNED COMMA) should be used after O/o/G/g, and U+02BC ʼ (MODIFIER LETTER APOSTROPHE) in all other cases.

ftyers commented 3 years ago

Can you look at 3102b67, does this solve the problem?

It seems some of the others were fixed in https://github.com/apertium/apertium-uzb/commit/4002f9a7b56532ab2382a98b3122046e16775666 and https://github.com/apertium/apertium-uzb/commit/08ca490c43f8bfdc3fd731be45bdeeb7977b0b0b.