TinoDidriksen / spellers

Front-ends and packaging scripts for spellers. Git read-only mirror.
GNU General Public License v3.0
1 stars 0 forks source link

Hyphens should always be part of the word #9

Closed snomos closed 8 years ago

snomos commented 8 years ago

At least in the Sami languages. This is true for all positions (initial, middle, final): -davvi el-rávdnje dutki-

Screenshots demonstrating how it is now, and how this leads to irrelevant suggestions:

skjermbilde 2015-11-19 kl 17 03 12 skjermbilde 2015-11-19 kl 17 04 09 skjermbilde 2015-11-19 kl 17 04 50

In all these cases the whole string «el-rávdnje» etc should be the input, and suggestions should be generated from that.

Whether a hyphen is part of a word or not is definitely language dependent. E.g. in English it is probably best to treat the hyphen as a NON-word char (ie a word separator), whereas for Sámi, Norwegian and most other Nordic languages it should be a word character. The only exception to this is when it is used alone, ie with whitechars on each side.

Our Sámi fst's are built with this in mind, and should be able to handle all correct uses of hyphens.

TinoDidriksen commented 8 years ago

Fixed in r85 (commit 931c5da5ab0a67558b597eb5510c8ad48fd614c3) by adding these to the usable character list: