Closed albbas closed 3 years ago
Date: 2020-09-03 11:05:09 +0200
From: Trond Trosterud <
Synopsis: The problem is that an empty character (actually: every character SPACE) is analysed as MODIFYER LETTER APOSTROPHE,
Input is: Йомак ¶ Туш то ¶
Command for analysis is: ccat -l mhr ~/rusbound/converted/mhr/ficti/|hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst
Output is:
"<Йомак>"
"йомак" N Attr
Date: 2020-09-03 11:08:42 +0200
From: Trond Trosterud <
Correction: It does not happen for spaces betšeen šords. Here I get them before and after :\n, i.e. at the end of the sentence:
e "тиде книга." | hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst
"<тиде>"
"тидаш" V ConNeg |
---|
"<книга>"
"книга" A
It happens only for mhr.
Date: 2021-10-27 22:32:40 +0200
From: Sjur Nørstebø Moshagen <
The problem seems to have been fixed:
echo "тиде книга." | hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst
"<тиде>"
"тидаш" V ConNeg |
---|
"<книга>"
"книга" A
This issue was created automatically with bugzilla2github
Bugzilla Bug 2674
Date: 2020-09-03T11:05:09+02:00 From: Trond Trosterud <>
To: Sjur Nørstebø Moshagen <>
CC: borre.gaup, chiara.argese, jeremy.bradley, rueter.jack, trond.trosterud, unhammer+apertium
Last updated: 2021-10-27T22:32:40+02:00