TinoDidriksen / spellers

Front-ends and packaging scripts for spellers. Git read-only mirror.
GNU General Public License v3.0
1 stars 0 forks source link

EN DASH (U+2013) is not ignored by speller #1

Closed snomos closed 8 years ago

snomos commented 8 years ago

The following text will trigger a red underline in MS Word using the SME speller (version: Divvun-sme-2015.292.177.msi, 2015-10-19, 02:57):

– Fertejit čielga njuolggadusat

The words are accepted, but not the initial EN DASH.

snomos commented 8 years ago

Note that the set of characters considered part of legal words varies a bit from language to language. E.g. is colon ":" not part of words in English, Danish and Norwegian (and presumably Greenlandic), whereas it could or could not be a part of a legal word in Swedish, Finnish and the Sámi languages, where it is used as a separator between a stem and inflectional endings for acronyms, digits etc:

CD:s (from SME), TV:n (swe)

For these languages it is not a part of the word if it is the last char in the word - in that case it could be an indication of direct speech coming next, just as in e.g. Danish.

TinoDidriksen commented 8 years ago

Fixed in latest versions, http://apertium.projectjj.com/spellers/ Btw, the 2015.292.177 part is an absolute timestamp, with minute precision.

The concern about which characters are legal where, is already part of the algorithm. The verbatim input is always tested first, before any manipulation to find a valid form is attempted.

Whether MS Word cares about it is another matter. I have no control whatsoever over what MS Word decides to send to the speller as a token, nor can I inspect the context of a given token. I get what I get, and I better be happy with it.

snomos commented 8 years ago

It seems that MS Word is still confused, at least the latest nightly build is still giving red underlines under these characters. MS Office 2010, 13, 16, Win7, 8, 10.

TinoDidriksen commented 8 years ago

There was an issue with trailing non-alphanumerics, fixed in latest builds.

My test text for sme: –Finnmárkku– (báhppa) () [vákten láhkai] vákten ládjii vákten ládje Finnmárkkubáhppa –artistta guovttis– artisttaguovttis Innst. O. Nr CD:s yielding:

speller-sme