giellalt / bugzilla-dummy

0 stars 0 forks source link

Letters are not recognized in izh and myv (Bugzilla Bug 1497) #1698

Closed albbas closed 11 years ago

albbas commented 11 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 1497

Date: 2012-11-02T10:29:18+01:00 From: Jack Rueter <> To: Sjur Nørstebø Moshagen <> CC: lene.antonsen, thomas.omma, tommi.pirinen, trond.trosterud

Depends on: #1502 Last updated: 2012-11-06T15:45:49+01:00

albbas commented 11 years ago

Comment 7278

Date: 2012-11-02 10:29:18 +0100 From: Jack Rueter <>

Created attachment 146 screen shot of (izh) and (myv) not recognizing letters with uizh and umyv

A change since last night has disabled umyv and uizh

In izh uppercase letters are no longer recognized as variants of lowercase letters.

In myv letters are not recognized.

Attached file: letters-missing_2012-11-02.tiff (image/tiff, 104672 bytes) Description: screen shot of (izh) and (myv) not recognizing letters with uizh and umyv

albbas commented 11 years ago

Comment 7279

Date: 2012-11-02 10:39:06 +0100 From: Trond Trosterud <>

Cf. also bug #1456, inituppercase, which seems to be related to one of the two problems here.

I have given that bug a lot of attention, and simply were not able to see the difference between izh (initupper working) and fin (initupper not working). Now, at least they are on the same line (neither works).

albbas commented 11 years ago

Comment 7288

Date: 2012-11-02 16:26:16 +0100 From: Sjur Nørstebø Moshagen <>

I will try to solve this next week. I really don't understand what is going on. Adding Tommi to the Cc list.

albbas commented 11 years ago

Comment 7295

Date: 2012-11-03 10:33:24 +0100 From: Jack Rueter <>

In izh the initial letter cannot be up-cased. ++ $ uizh 0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100% abstraktsia abstraktsia abstraktsia+N+Sg+Nom

Abstraktsia Abstraktsia Abstraktsia +?

~~ $ huizh abstraktsia Abstraktsia

$ hfst-lookup src/analyser-gt-norm.hfst abstraktsia Abstraktsia

++

If, however, I decide to generate random output with the following

:izh jackrueter$ hfst-fst2fst -f openfst-tropical src/analyser-gt-norm.hfst | hfst-compose -F -2 filterNominals.hfst | hfst-fst2strings -r50000 -c1 > nominal_strings04

where filterNominals.hfst is generated from: ? [ %+Nom | %+Part | %+Gen | %+Ine | %+Ill | %+Ela | %+All | %+Ade | %+Abl ] ?

the result includes both upper-case initial and lower-case initial words, whereas the uppercased words are not accepted in uizh.

++ Abstraktsia:abstraktsia+N+Sg+Nom Abstraktsiakaa:abstraktsia+N+Sg+Nom+Clt/kAA Abstraktsiakii:abstraktsia+N+Sg+Nom+Foc/kii Abstraktsiatkaa:abstraktsia+N+Pl+Nom+Clt/kAA Abstraktsiaskii:abstraktsia+N+Sg+Ine+Foc/kii ...

abstraktsia:abstraktsia+N+Sg+Nom abstraktsiakaa:abstraktsia+N+Sg+Nom+Clt/kAA abstraktsiakii:abstraktsia+N+Sg+Nom+Foc/kii abstraktsiat:abstraktsia+N+Pl+Nom abstraktsiatkii:abstraktsia+N+Pl+Nom+Foc/kii abstraktsiankaa:abstraktsia+N+Sg+Gen+Clt/kAA abstraktsiaskii:abstraktsia+N+Sg+Ine+Foc/kii ... ++

The upper-case is generated according to the same file that is blocked in huizh.

I am also able to generate random forms in Cyrillics in myv. There seem to be problems with @FLAG@ use, as well.

albbas commented 11 years ago

Comment 7297

Date: 2012-11-03 17:21:14 +0100 From: Trond Trosterud <>

Development here: For izh, the problem is the same initial 0 under LEXICON Root as for fin.

I added (0) to inituppercase.regex, and now get:

Yksi Yksi yks+Num+Card+Sg+Nom

yksi yksi yks+Num+Card+Sg+Nom

But the error reported in the attachment is still with us:

yksiköös yksiköös yksikkö+N+Sg+Ine

Yksiköös Yksiköös Yksiköös +?

But as can be seen from the above, this error is not (alone) linked to the initupper issue.

albbas commented 11 years ago

Comment 7315

Date: 2012-11-06 09:36:58 +0100 From: Sjur Nørstebø Moshagen <>

Added dependency on bug #1502, as it is quite hard to test possible solutions to this bug without a working build infra.

albbas commented 11 years ago

Comment 7326

Date: 2012-11-06 15:45:49 +0100 From: Jack Rueter <>

word-initial upper-casing is working in both izh and myv.