Closed albbas closed 2 months ago
Date: 2020-02-21 16:31:08 +0100
From: Jack Rueter <
Created attachment 229 png of tokeniser output for vro text with lemma containing U+0301
cd main/langs/vro
head config.log $ ./configure --with-hfst --without-xfst --enable-tokenisers --enable-reversed-intersect --enable-spellers --enable-alignment --enable-apertium --enable-dicts --enable-morpher --with-giella-shared=/Users/rueter/main/giella-shared --with-giella-core=/Users/rueter/main/giella-core GIELLA_CORE=/Users/rueter/main/giella-core/dir GTCORE=/Users/rueter/./main/giella-core GIELLA_SHARED=/Users/rueter/main/giella-shared/dir
echo 'mitte' | hfst-tokenise --giella-cg -W $GTHOME/langs/vro/tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |less
"
In lemma-final position, the t AND U+0301 combination are left outside of the lemma, see "mit"t́
In non-final position, subsequent lemma material is quoted, see "mi"t́"mä"
Attached file: vro-tokeniser-problem-2020-02-22.png (image/png, 149077 bytes) Description: png of tokeniser output for vro text with lemma containing U+0301
works today:
$ echo 'mitte' | hfst-tokenise --giella-cg -W tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst
"<mitte>"
"mitt́" N Pl Gen
"mitt́" N Pl Ill
"mitt́" N Pl Par
"mit́mä" V Act Ind Prt Sg3
:\n
This issue was created automatically with bugzilla2github
Bugzilla Bug 2647
Date: 2020-02-21T16:31:08+01:00 From: Jack Rueter <>
To: Sjur Nørstebø Moshagen <>
CC: trond.trosterud
Last updated: 2020-02-21T16:31:08+01:00