giellalt / lang-sms

Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Skolt Sami language
https://giellalt.uit.no
GNU Lesser General Public License v3.0
4 stars 0 forks source link

Speller is incomplete #7

Closed trondtynnol closed 2 months ago

trondtynnol commented 5 months ago

The speller (at least for sms) seems to lack some lemmas after the reorganizing.

$ husmsNorm
mieʹrreed
mieʹrreed   mieʹrreed+V+Imprt+Pl2   0.000000
mieʹrreed   mieʹrreed+V+Inf 0.000000
$ hfst-ospell sms.zhfst 
mieʹrreed
"mieʹrreed" is NOT in the lexicon:
trondtynnol commented 5 months ago

Other words which worked previously include eeǥǥas, jieʹllem, jieʹllemvueʹjji, puärrsumus, mieʹrree, Ruõššjânnam, tuâkka, kuâđđjam, Njeäʹllem, šiõttee, šiõttuum, tueʹjjee, veiddsõs, pieʹllkueiʹm and vuässõõttâd.

rueter commented 5 months ago

The words mentioned here have one thing in common. They all require that either the modifier letter vertical line be removed OR that the Latin Small Letter E With Dot Below be converted to Latin Small Letter E. This must be related to Issue 23 (after move)

rueter commented 5 months ago

The speller (at least for sms) seems to lack some lemmas after the reorganizing.

$ husmsNorm
mieʹrreed
mieʹrreed mieʹrreed+V+Imprt+Pl2   0.000000
mieʹrreed mieʹrreed+V+Inf 0.000000
$ hfst-ospell sms.zhfst 
mieʹrreed
"mieʹrreed" is NOT in the lexicon:

Not a question of lemmas!!!
The src/fst/filters remove-letter-dot-below.hfst and remove-modifier-letter-vertical-line.hfst have not been applied. Hence we now have a pedagogical speller. I would like one of these, too. Or let's say, Tiina Sanila-Aikio has asked about this variety of spell checker for building teaching materials.

echo 'Ruõšˈšjânnam' |hfst-ospell tools/spellcheckers/sms.zhfst 
"Ruõšˈšjânnam" is in the lexicon...

lang-sms jackrueter$ echo 'kueʹtt' |hfst-ospell tools/spellcheckers/sms.zhfst 
"kueʹtt" is NOT in the lexicon:
Jacks-MacBook-Pro:lang-sms jackrueter$ echo 'kuẹʹtt' |hfst-ospell tools/spellcheckers/sms.zhfst 
"kuẹʹtt" is in the lexicon...
trondtynnol commented 5 months ago

I see! Perhaps it would even be possible to get such a "pedagogical variant" published through Divvun Manager alongside the standard speller, @flammie @snomos ? It certainly has its use cases for some users, at least.

flammie commented 5 months ago

so a variant with pedagogic orthography can be achieved by copying the generator-speller-gt-norm.tmp.hfst to generator-speller-gt-norm.hfst instead of applying filters. That was the fallback that was used since filters were not pointing to right dir in Makefile.mods and make didn't find them.

trondtynnol commented 2 months ago

It seems this problem has reappeared:

$ husms
vieʹǩǩ
vieʹǩǩ  veäʹǩǩ+N+Pl+Nom 0.000000
vieʹǩǩ  veäʹǩǩ+N+Sg+Acc 0.000000
vieʹǩǩ  veäʹǩǩ+N+Sg+Gen 0.000000

$ hfst-ospell sms.zhfst 
vieʹǩǩ
"vieʹǩǩ" is NOT in the lexicon:
viẹʹǩǩ
"viẹʹǩǩ" is in the lexicon...
./configure --enable-spellers