giellalt / giella-core

Build tools and build support files as well as developer support tools for the GiellaLT repositories.
https://giellalt.uit.no
GNU General Public License v3.0
7 stars 2 forks source link

speller Levenshtein manipulations are ignored by hfst-ospell #45

Closed Trondtr closed 5 months ago

Trondtr commented 5 months ago

To repeat: compile spellers in e.g. lang-mns. As shown, the intended иӈ for ин has a distance of 2, as defined in ditdist.default.txt. Unfortunately, mns.zhfst does not read this definition, and returns 10 (one Levenshtein operation).

uit-mac-443 lang-mns (main)$ e ин|hfst-ospell -S -n 10 tools/spellcheckers/mns.zhfst 
"ин" is NOT in the lexicon:
Corrections for "ин":
и    10.000000
и-    10.000000
ис    10.000000
иӈ    10.000000
щин    10.000000
шин    10.000000
итн    10.000000
ит    10.000000
исн    10.000000
и.    10.000000

uit-mac-443 lang-mns (main)$ grep ӈ tools/spellcheckers/editdist.default.txt 
ӈ
н   ӈ   2
uit-mac-443 lang-mns (main)$ 

Also the other files (strings.default.txt, final.default.txt etc) are invisible to msn.zhfst.

The same goes for other languages, but not for all: sme and sms work fine, the same does the cyrillic-based mhr.

flammie commented 5 months ago

I think this is resolved in https://github.com/giellalt/lang-mns/commit/cb92ee6e372c4f6e48715ff861aa379b70d038af by removing extra whitespace.