Closed snomos closed 8 years ago
Note that the set of characters considered part of legal words varies a bit from language to language. E.g. is colon ":" not part of words in English, Danish and Norwegian (and presumably Greenlandic), whereas it could or could not be a part of a legal word in Swedish, Finnish and the Sámi languages, where it is used as a separator between a stem and inflectional endings for acronyms, digits etc:
CD:s (from SME), TV:n (swe)
For these languages it is not a part of the word if it is the last char in the word - in that case it could be an indication of direct speech coming next, just as in e.g. Danish.
Fixed in latest versions, http://apertium.projectjj.com/spellers/ Btw, the 2015.292.177 part is an absolute timestamp, with minute precision.
The concern about which characters are legal where, is already part of the algorithm. The verbatim input is always tested first, before any manipulation to find a valid form is attempted.
Whether MS Word cares about it is another matter. I have no control whatsoever over what MS Word decides to send to the speller as a token, nor can I inspect the context of a given token. I get what I get, and I better be happy with it.
It seems that MS Word is still confused, at least the latest nightly build is still giving red underlines under these characters. MS Office 2010, 13, 16, Win7, 8, 10.
There was an issue with trailing non-alphanumerics, fixed in latest builds.
My test text for sme: –Finnmárkku– (báhppa) () [vákten láhkai] vákten ládjii vákten ládje Finnmárkkubáhppa –artistta guovttis– artisttaguovttis Innst. O. Nr CD:s
yielding:
The following text will trigger a red underline in MS Word using the SME speller (version: Divvun-sme-2015.292.177.msi, 2015-10-19, 02:57):
– Fertejit čielga njuolggadusat
The words are accepted, but not the initial EN DASH.