Closed ventolinmx closed 4 years ago
More likely than not, it's a problem with your dictionary. Atom doesn't deal well with dictionaries that are not UTF-8 encoded.
Probably related to #212 ?
@lierdakil : How to obtain UTF-8 encoded dictionaries?
@dvictori: I think #212 is definitely going to cause you problems even with a UTF-8 dictionary. There is a defect on node-spellchecker that is trying to fix that. Until that is resolved, I don't know if we can do much more.
@dvictori Just find some? e.g. https://github.com/wooorm/dictionaries
@dmoonfire, I don't have any issues described in #212. Gentoo Linux, Atom 1.28.0, LANG=ru_RU.UTF-8
I probably would if my de-DE dictionary was, say, cp1252-encoded.
@lierdakil: I stand corrected. Does it show it spelled correctly if you have Löwen
?
Yes, it does:
Additionally, I've tried converting my dictionary from UTF8 to ISO8859-1 (as is common with extended latin hunspell dictionaries), and here's what I've got: Looks suspiciously similar to #212 I believe.
Oh, I know why you are behaving. I found that the .UTF-8
fixes the problem. However most people don't have that in their language settings so it didn't pick it up correctly. So, my LANG=en_US
couldn't handle a UTF-8 dictionary either because of node-spellcheck
didn't switch the locale()
to UTF-8.
I suspect if you just had LANG=ru_RU
it may misbehave.
If I just had LANG=ru_RU
, IIRC my default system encoding would be KOI8-R, which is a chthonic abomination from the dawn of the computer era that must be killed with fire :) So thanks but no thanks, I quite like my UTF-8 terminals that can handle more than two languages.
I was under the impression that modern Linux distributions prefer UTF-8 locales. Pretty sure at least Arch and Gentoo do.
@lierdakil Bingo! I used the dictionaries from wooorm and now atom spell check is working. Just hope it won't break any other program. So far, libreoffice and firefox spell check looks fine.
It would be nice though, for users less technically inclined, to be able to use their native dictionary, that comes with the operating system, without having to change the file.
@dmoonfire, FWIW, running Atom with env LANG=en_US atom
doesn't seem to change the behaviour any. That is, UTF-8 dictionaries are still working.
EDIT: LANG=en_US.ISO-8859-1
doesn't seem to have any effect either.
So i installed wooorm's spanish UTF-8 dictionary with npm install dictionary-es
and it behaves the same. Do i need to configure this in Atom somewhere to activate the UTF dictionary? I have a special locale mix in Debian, using en_US LANG, but changing this to spanish has the same problem.
@ventolinmono, you can point Atom to the directory where you installed the dictionary. Check spell-check settings.
I just copied the files from wooorm repository to /usr/share/hunspell
and renamed to the correct locale. So dictionaries/pt-BR/index.dic
from wooorm became /usr/share/hunspell/pt_BR.dic
. A very ugly hack, I might say.
I never know about wooorm's dictionaries. They have a MIT license, so that is reasonable. If the UTF-8 is the only thing needed, I'll try creating a couple Atom packages to install specific language dictionaries and see if that behaves; the plugin system for spell-check is designed for that.
@dmoonfire They do not have an MIT license. Every dictionary comes with a different license!
Here's a problem the I have with this. $LANG = pt_BR.UTF-8
Ubuntu 16.04.
I just copied the files from wooorm repository to
/usr/share/hunspell
and renamed to the correct locale. Sodictionaries/pt-BR/index.dic
from wooorm became/usr/share/hunspell/pt_BR.dic
. A very ugly hack, I might say.
@edusantana this worked for me!
On archlinux, I solved it by doing: iconv -t UTF-8 -f ISO-8859-1 /usr/share/hunspell/YOURDIC.dic > /usr/share/hunspell/YOURDIC.dic
. It's simply an issue of encoding.
I would really like to avoid converting my dictionaries into UTF-8 encoding. I'm using original dictionaries from LibreOffice, sharing them between multiple applications and I'm not sure they'll be still working after the conversion. Sure, I can try it but I would like to avoid the conversion every time I update the dictionaries anyway.
The .aff
file contains the encoding the dictionary is using at the very first line (in my case it's SET ISO8859-2
) so it should be easy to read it and use it without any user intervention.
@ferenczy Definitely. I found these issues: https://github.com/LibreOffice/dictionaries/issues/7 in the libreoffice repo. And https://github.com/atom/node-spellchecker/issues/89 in atom itself
Ideally, a conversion shouldn't be needed because most dictionary files tell you their encoding. I'm trying to get back on this to look at it, I think the underlying problem is at the C++ layer which is no longer my strength, but I have a few obligations that are getting in the way. I want to fix this, mainly because it is driving me nuts too. :)
@dmoonfire any luck with that? Any work around?
I have converted those file to UTF-8 and replaced the SET UFT-8
e added the FLAG UTF-8
but I still have this problem.
@edusantana: Over the last week, I worked on a PR for node-spellchecker
which should fix the encoding errors that were happening between Hunspell and Javascript. If all goes well, I can get that verified and rolled into Atom. It should handle most of the accented word problems. It also doesn't require dictionaries to be in UTF-8 format either, so dropping them in should hopefully Just Work™.
https://github.com/atom/node-spellchecker/pull/95
It just took me a while to figure out text encoding on C++ on four different platforms.
converting latin1 files to utf8 and changing the format tag did not work for me, as it somehow gets only a subset of the dictionary so it still shows correct words as misspelled.
Is there any way for me to configure a path for the dictionary in a way that this extension will get it? I don't want to risk losing other spellcheck tools as they are working properly
Atom 1.37 has a fix for passing accented characters for spell-checking. It handles dictionaries files that aren't UTF-8 encoded. Could you please check with the beta and see if it solves the problem? Thank you.
@dmoonfire I will try it... Thanks!!! It works now!!! Look!
It sounds like this is resolved, so I'm going to close this issue. Feel free to open a new one.
Prerequisites
Description
On .md and .txt files spanish words with accents showed as misspelled but they are correct. Using aspell es-ES locales.
Steps to Reproduce
Expected behavior: Spell-check should recognize correct words with accents.
Actual behavior: Atom underlines all words with accents, although they are correct.
Reproduces how often: Always.
Versions
Atom : 1.23.3 Electron: 1.6.15 Chrome : 56.0.2924.87 Node : 7.4.0
apm 1.18.12 npm 3.10.10 node 6.9.5 x64 atom 1.23.3 python 2.7.13 git 2.11.0
Debian 9.
Additional Information
Tried checking the same file with aspell on command line and works fine. It recognizes words with accents as correct. Also tried different encodings.