atom / node-spellchecker

SpellChecker Node Module
http://atom.github.io/node-spellchecker
MIT License
300 stars 108 forks source link

Problems with multiple languages on Windows 10 #109

Closed dmoonfire closed 5 years ago

dmoonfire commented 5 years ago

So, my work machine has a fresh install of Windows 10 and I tried to do something that I understood worked, however it didn't so I figured I'd ask. I can't seem to get multiple language spell-checking working with the built-in Windows 10 (using the environment variable to switch to Hunspell seems to not be a problem, just the built-in).

My machine is a 64-bit Windows 10 with English as the default language. My understanding is if I added the German (de-DE) language, it would handle spell-checking in that language. However that does not seem to be true.

const s = require('spellchecker');

const en = new s.Spellchecker();
en.setDictionary('en-US', '');

console.log('cheese', en.isMisspelled('cheese')); // false
console.log('ljkf', en.isMisspelled('ljkf')); // true

const de = new s.Spellchecker();
de.setDictionary('de-DE', '');

console.log('cheese', de.isMisspelled('cheese')); // true
console.log('Danke', de.isMisspelled('Danke')); // should be false but it is true

On my home Windows 10 machine, s.getAvailableDictionaries() returns []. On my work Windows 10, it returns 4 languages, all en-*. I'm not sure why I'm getting a different response there either.

While investigating it, I found the following that indicates that the Spell Checking API doesn't allow additional languages as I thought.

Am I doing something wrong? I think I'm getting stuck.

Thank you.

rafeca commented 5 years ago

Hi @dmoonfire! Thanks for reporting this!

I tried reproducing it from a newly installed Windows 10 machine using the code that you provided, and things work well once I install the german language from the Windows 10 control panel.

Did you add the german language to your Windows machine? The steps to do so are specified in the spell-checker package.

dmoonfire commented 5 years ago

@rafeca: I followed those instructions and I was pretty sure they worked previously, I was just stuck with my current setup which confused me. So, do you think there is a chance this is a Windows 10 Enterprise setup? From the below screenshot, you can see that I have German installed, PowerShell says that it is providing spell-checking, but the resulting JS still say the test is incorrect (I refactored a little to make it easier to see).

image

dmoonfire commented 5 years ago

Oh, the PowerShell command is Get-WinUserLanguageList.

dmoonfire commented 5 years ago

Hrm, I noticed that the screenshot has Danke being correct identified as not misspelled. But cheese is now incorrectly marked as not misspelled. I have no clue why it works now when it didn't previously. I've uninstalled and installed languages a few times, maybe it started mostly working.

However, it says cheese in German is not misspelled when cheese isn't a valid German word.

FYI, the entire reason I'm doing this exercise is to see if I need to specify languages when using the built-in checker for Windows. It appears that the Mac performs better if you don't include a language specifier (that is the bug I'm trying to fix for atom/spell-check). However, I might need a better Windows 10 test machine to verify that if my current one won't work.

Trying a modified script:

const s = require('spellchecker');

const en = new s.Spellchecker();
en.setDictionary('en-US', '');

const de = new s.Spellchecker();
de.setDictionary('de-DE', '');

const es = new s.Spellchecker();
es.setDictionary('es-ES', '');

console.log("dictionaries", s.getAvailableDictionaries());

for (var word of ["cheese", "ljkf", "Danke", "pollo"])
{
    console.log(
        "s ",
        word,
        s.isMisspelled(word),
        s.getCorrectionsForMisspelling(word).join(","));
    console.log(
        "en",
        word,
        en.isMisspelled(word),
        en.getCorrectionsForMisspelling(word).join(","));
    console.log(
        "de",
        word,
        de.isMisspelled(word),
        de.getCorrectionsForMisspelling(word).join(","));
    console.log(
        "es",
        word,
        es.isMisspelled(word),
        es.getCorrectionsForMisspelling(word).join(","));
}

I get this on my machine:

dictionaries [ 'en-CA', 'en-LR', 'en-PH', 'en-US' ]
s  cheese false cheesy,chase,chasee,chaise,cheese's,chose
en cheese false cheesy,chase,chasee,chaise,cheese's,chose
de cheese false
es cheese false
s  ljkf true like,lake,lakh,laky
en ljkf true like,lake,lakh,laky
de ljkf false
es ljkf false
s  Danke true Dance,Danker,Danka,Dane,Dank,Danced,Dancer,Dances,Dante
en Danke true Dance,Danker,Danka,Dane,Dank,Danced,Dancer,Dances,Dante
de Danke false
es Danke false
s  pollo true polo,poll,polio,polls,pole
en pollo true polo,poll,polio,polls,pole
de pollo false
es pollo false
rafeca commented 5 years ago

Hrm, I noticed that the screenshot has Danke being correct identified as not misspelled. But cheese is now incorrectly marked as not misspelled. I have no clue why it works now when it didn't previously. I've uninstalled and installed languages a few times, maybe it started mostly working.

Ok, I'm actually seeing something similar than you when I specify an invalid locale:

const invalid = new s.Spellchecker();
invalid.setDictionary('he-HE', ''); // some locale that I don't have installed

console.log('cheese', invalid.isMisspelled('cheese')); // false
console.log('Danke', invalid.isMisspelled('Danke')); // false

So there are two issues here:

  1. Somehow even if you have the german language installed (and Get-WinUserLanguageList returns it), node-spellchecker is not able to find it.
  2. When node-spellchecker is not able to find a language on Windows, instead of failing it silently keeps "working" but returning every word as valid.

Regarding 1. as you mention it may be related to Windows Enterprise... I'm not familiar enough with Windows but maybe Windows does not give access to additional languages for non-admin users?

I'll try to create a non-admin user and check if I can reproduce

rafeca commented 5 years ago

Ok! I think I've been able to reproduce it 🎉

It seems that when run from a non-admin account on Windows, node-spellchecker uses the languages installed by the admin account... I'm not sure what's the root cause of it but it definitely should be fixed if possible...

rafeca commented 5 years ago

I've investigated it further and I'm afraid that this is something related to the Windows Checkspelling API 😞

Somehow, after adding a language (via the "Region and Language" settings panel), the language spelling features are not exposed to the SDK automatically if the user who added the language is not an administrator 🤷‍♂️

In order to enable the spell check features, the user needs to install the "Basic Typing" language feature after the language has been added:

Screen Shot 2019-03-12 at 12 34 32 Screen Shot 2019-03-12 at 12 34 41

This, unfortunately, needs administration permissions...

A couple of things we can do here:

  1. Try to use Hunspell checker if the selected language is not supported. There's already some logic to default to Hunspell for old Windows, maybe we could have something similar where we check if a language is supported...
  2. Update the Readme on https://github.com/atom/spell-check to add the information regarding non-Administrator users.

What do you think?

dmoonfire commented 5 years ago

Switching Providers

I worry about automatically switching over to Hunspell based on availability, mainly because it would create inconsistent spelling based on a factor we couldn't easily unit test (spell-check has exceptions in the unit test just to handle Mac's different suggestions). One thing I was thinking was adding a parameter to the constructor for the class that effectively duplicated the PREFER_HUNSPELL environment so we could pass it in. Maybe something like: (0 - Choose, 1 - Use System, 2 - Use Hunspell). That way, the calling program could give the user a chance to always use Hunspell if they know what they are doing.

(On Linux, this would just use Hunspell for everything.)

Related to that, my fix for https://github.com/atom/atom/issues/15912 was to pull out the idea of "use system checker" (the Choose or Use System from above) from the locale-specific (Use Hunspell above). That appears to have significantly reduced the amount of errors when switching constantly between multiple languages in OS X. If I had that, then I could say "use the Windows 10" by checking "use system checker" and providing the IEFT tags or, if that doesn't work, uncheck system checker and make sure I have a dictionary in the search path.

Being able to identify if a language loads (maybe by a flag) would be nice though for feedback.

Updating Read Me

So, I normally run without admin and only switch to admin when I have to (company policy). When I ran my above script in an elevated terminal, it didn't do any checking at all with the locale-specific but the static instance (my s) did English only. So I suspect that it is a big mess of what does and doesn't work, so being able to explicitly be able to switch to Hunspell would be nice.

Basic Typing

EDIT: My environment won't let me install basic typing. My IT staff hates me. 🗡

EDIT 2: The Choose option would have done the old code of checking the environment variable.

rafeca commented 5 years ago

One thing I was thinking was adding a parameter to the constructor for the class that effectively duplicated the PREFER_HUNSPELL environment so we could pass it in. Maybe something like: (0 - Choose, 1 - Use System, 2 - Use Hunspell). That way, the calling program could give the user a chance to always use Hunspell if they know what they are doing.

I like this idea! is this something you would be willing to implement?

Being able to identify if a language loads (maybe by a flag) would be nice though for feedback.

Right now the setDictionary() method returns false if it could not load the language, if your proposed solution gets implemented it could return the speller that was loaded.

So, I normally run without admin and only switch to admin when I have to (company policy). When I ran my above script in an elevated terminal, it didn't do any checking at all with the locale-specific but the static instance (my s) did English only. So I suspect that it is a big mess of what does and doesn't work, so being able to explicitly be able to switch to Hunspell would be nice.

AFAICT running the script as an Administrator won't help at all: the only way to be able to use the Windows Spellchecker is to install the basic typing for that language...

I'm gonna close this issue for now, there are then two follow-ups:

  1. Implement your suggestion around using a trinary for selecting the preferred speller. I'll let you handle this one.
  2. Update the Readme on https://github.com/atom/spell-check to add the information regarding non-Administrator users. I can take care of this.
dmoonfire commented 5 years ago

I'll do the implementation and push up a PR. Thank you.