giellalt / bugzilla-dummy

0 stars 0 forks source link

malformatted number ranges get weird analyses by the fsts (Bugzilla Bug 2716) #1710

Closed albbas closed 3 years ago

albbas commented 3 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 2716

Date: 2021-07-06T11:00:28+02:00 From: Linda Wiechetek <> To: Sjur Nørstebø Moshagen <> CC: lene.antonsen, thomas.omma, tommi.pirinen, trond.trosterud

Last updated: 2021-07-06T13:00:58+02:00

albbas commented 3 years ago

Comment 14214

Date: 2021-07-06 11:00:28 +0200 From: Linda Wiechetek <>

here we have a date range that should be 16.-20. However it is malformatted, i.e. there is a space after the hyphen, and 16.- simply gets the analysis with a comma, but nothing else. So CG cannot even select the correct reading. This is the sentence:

Dálkkádatrievdamat ja álgoálbmogat lei okta máŋgga fáttás mat ságaškuššojuvvojedje go ONa Álgoálbmogiid Bistevaš Foruma (UNPFII) lahtut dolle ovdačoahkkima Kárášjogas áigodagas njukčamánu 16.- 20. b. Sámedikkiin ja Ruošša beale ovddasteaddjiin ledje sierra čoahkkimat UNPFII:in.

"<njukčamánu>" "njukčamánnu" N Sem/Time Sg Gen ADD:2210 @<ADVL MAP:22777:r263 #23->30 SETPARENT:5564 SETPARENT:5564 SETPARENT:5564 SETPARENT:556 4 ; "njukčamánnu" N Sem/Time Sg Acc ADD:2210 REMOVE:17226:r2060 ; "mánnu" N Sem/Measr_Time Sg Acc ADD:2210 ; "njukča" N Sem/Ani Cmp/SgNom Cmp REMOVE:3099:longest-match ; "mánnu" N Sem/Measr_Time Sg Gen ADD:2210 ; "njukča" N Sem/Ani Cmp/SgNom Cmp REMOVE:3099:longest-match : "<16.->" "16,-" Err/Orth Num Arab Sg Nom @SPRED @APP-N< @SUBJ SELECT:6776:r1089 MAP:22332:r136 &typo #24->24 ADD:9590:Err/Orth-any typo "16,-" Num Arab Sg Nom @SPRED @APP-N< @SUBJ SELECT:6776:r1089 MAP:22332:r136 &typo &SUGGEST #24->24 ADD:9590:Err/Orth-any COPY:9599:Err/Orth-any 16,-+Num+Arab+Sg+Nom 16,- ; "16,-" Err/Orth Num Arab Sg Acc SELECT:6776:r1089 ; "16,-" Err/Orth Num Arab Sg Gen SELECT:6776:r1089 ; "16,-" Err/Orth Num Arab Sg Ill Attr SELECT:6776:r1089 ; "16,-" Err/Orth Num Arab Sg Loc Attr SELECT:6776:r1089 ; "16,-" Err/Orth Num Arab Sg Nom @HNOUN SELECT:6776:r1089 MAP:22332:r136 REMOVE:24140:r3422 : "<20.>" "20" A Arab Ord Attr SELECT:3019:ord-before-noun @>N MAP:22187:r86 #25->25 ; "." CLB "<.>" SELECT:3019:ord-before-noun ; "20" Num Sem/ID "<20>" ; "." CLB "<.>" SELECT:3019:ord-before-noun ; "20" Num Arab Sg Nom "<20>" ; "." CLB "<.>" SELECT:3019:ord-before-noun ; "20" Num Arab Sg Loc Attr "<20>" ; "." CLB "<.>" SELECT:3019:ord-before-noun ; "20" Num Arab Sg Ill Attr "<20>" ; "." CLB "<.>" SELECT:3019:ord-before-noun ; "20" Num Arab Sg Gen "<20>" ; "." CLB "<.>" SELECT:3019:ord-before-noun ; "20" Num Arab Sg Acc "<20>" ; "." CLB "<.>" SELECT:3019:ord-before-noun ; "20" A Arab Ord Attr CLBfinal "<20>"

albbas commented 3 years ago

Comment 14215

Date: 2021-07-06 13:00:58 +0200 From: Lene Antonsen <>

16.- 20. har mellomrom etter bindestrek, og denne varianten manglet i FST, men jeg har nå lagt den til svn ci -m 'la til enda en vvarinat for dato til dato, se bz. 2716' arabic_roman_digits.lexc Sending arabic_roman_digits.lexc Transmitting file data .done Committing transaction... Committed revision 1367.

Grunnenn til at 16. blei 16, er: ,%-+Err/Orth:.∑- NUM-ARABICCASES ; ! 10.- It is wrong, but written.

"<njukčamánu>" "njukčamánnu" N Sem/Time Sg Gen : "<16.- 20.>" "16.-20" A Arab Ord Attr : "<b.>" "b" Adv Sem/Time ABBR Gram/TNumAbbr "<njukčamánu>" "njukčamánnu" N Sem/Time Sg Gen : "<16.- 20.>" "16.-20" A Arab Ord Attr : "<b.>" "b" Adv Sem/Time ABBR Gram/TNumAbbr