giellalt / lang-sms

Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Skolt Sami language
https://giellalt.uit.no
GNU Lesser General Public License v3.0
4 stars 0 forks source link

Wrong analysis for "10" #4

Closed carges closed 3 years ago

carges commented 3 years ago

I get the following analysis for "10":

echo 10|hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst
"<10>"
     Use/Circ"1" Use/Circ"0" Num Sg Acc <W:0.0>
     Use/Circ"1" Use/Circ"0" Num Sg Gen <W:0.0>
     Use/Circ"1" Use/Circ"0" Num Sg Ill Attr <W:0.0>
     Use/Circ"1" Use/Circ"0" Num Sg Loc Attr <W:0.0>
     Use/Circ"1" Use/Circ"0" Num Sg Nom <W:0.0>
     Use/Circ"10" Num Sg Acc <W:0.0>
     Use/Circ"10" Num Sg Nom <W:0.0>
:\n

I compiled today:

ls -l tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst
-rw-r--r--  1 car010  staff  112445830 May 21 11:58 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst
carges commented 3 years ago

It seems is not only for "10", but many others (possibly all?):

"<9.10.>"
     Use/Circ"9." Use/Circ"1" Use/Circ"0" A Ord Attr <W:0.0>
: 

and:

"<1826>"
     Use/Circ"1" Use/Circ"8" Use/Circ"2" Use/Circ"6" Num Sg Acc <W:0.0>
     Use/Circ"1" Use/Circ"8" Use/Circ"2" Use/Circ"6" Num Sg Gen <W:0.0>
     Use/Circ"1" Use/Circ"8" Use/Circ"2" Use/Circ"6" Num Sg Ill Attr <W:0.0>
     Use/Circ"1" Use/Circ"8" Use/Circ"2" Use/Circ"6" Num Sg Loc Attr <W:0.0>
     Use/Circ"1" Use/Circ"8" Use/Circ"2" Use/Circ"6" Num Sg Nom <W:0.0>
     Use/Circ"1" Use/Circ"8" Use/Circ"26" Num Sg Acc <W:0.0>
     Use/Circ"1" Use/Circ"8" Use/Circ"26" Num Sg Nom <W:0.0>
     Use/Circ"1" Use/Circ"826" Num Sg Acc <W:0.0>
     Use/Circ"1" Use/Circ"826" Num Sg Nom <W:0.0>
     Use/Circ"1826" Num Sg Acc <W:0.0>
     Use/Circ"1826" Num Sg Nom <W:0.0>
snomos commented 3 years ago

Fix proof:

echo 10 | hfst-lookup -q src/analyser-gt-desc.hfstol 
10  10+Num+Sem/ID   0,000000
10  10+Num+Arab+Sg+Acc  0,000000
10  10+Num+Arab+Sg+Gen  0,000000
10  10+Num+Arab+Sg+Ill+Attr 0,000000
10  10+Num+Arab+Sg+Loc+Attr 0,000000
10  10+Num+Arab+Sg+Nom  0,000000