en-wl / wordlist

SCOWL (and friends).
http://wordlist.aspell.net
Other
394 stars 79 forks source link

hunspell-en 2018.04.16 upper case issue #234

Open R-a-l-f opened 6 years ago

R-a-l-f commented 6 years ago

Hi,

I can't upgrade hunspell-en 7.1-3. I'm using the English and German hunspell dictionaries both together, in other words at the same time with Claws-Mail. If I write a mail in Claws-Mail and a sentence begins with a word, that inside of a sentence must begin with lower case, but since it's the first word, the first letter is upper case, it's marked as spelling mistake. This happened already a long time ago, see Arch Linux FS#48839 and is still happening nowadays.

In the below excerpts of the package management's log file hunspell-en_US 2018:06.29-1 is an empty dummy package to fulfil a dependency, if the correctly working hunspell-en 7.1-3 is installed. If I replace hunspell-en 7.1-3 and the dummy package by hunspell-en_AU, hunspell-en_CA, hunspell-en_GB and hunspell-en_US 2018.04.16-5 spell checking is broken.

[rocketmouse@archlinux ~]$ pacman -Q hunspell-de
hunspell-de 20161207-1
[rocketmouse@archlinux ~]$ grep hunspell /var/log/pacman.log | tail -13
[2018-11-10 17:12] [PACMAN] Running 'pacman -Syu hunspell-en_AU hunspell-en_CA hunspell-en_GB hunspell-en_US'
[2018-11-10 17:12] [ALPM] removed hunspell-en (7.1-3)
[2018-11-10 17:12] [ALPM] installed hunspell-en_AU (2018.04.16-5)
[2018-11-10 17:12] [ALPM] installed hunspell-en_CA (2018.04.16-5)
[2018-11-10 17:12] [ALPM] installed hunspell-en_GB (2018.04.16-5)
[2018-11-10 17:12] [ALPM] downgraded hunspell-en_US (2018:06.29-1 -> 2018.04.16-5)
[2018-11-11 19:10] [PACMAN] Running 'pacman -U current/hunspell-en_US-2018:06.29-1-any.pkg.tar.xz /usr/src/hunspell-en-7.1-3-any.pkg.tar.xz'
[2018-11-11 19:10] [ALPM] removed hunspell-en_GB (2018.04.16-5)
[2018-11-11 19:10] [ALPM] removed hunspell-en_CA (2018.04.16-5)
[2018-11-11 19:10] [ALPM] removed hunspell-en_AU (2018.04.16-5)
[2018-11-11 19:10] [ALPM] upgraded hunspell-en_US (2018.04.16-5 -> 2018:06.29-1)
[2018-11-11 19:10] [ALPM] installed hunspell-en (7.1-3)
[2018-11-13 08:28] [PACMAN] Running 'pacman -Rss hunspell-en_US'

After downgrading to hunspell-en 7.1-3 and installing the dummy package for hunspell-en_US, spell checking works correctly again.

Regards, Ralf

kevina commented 5 years ago

@R-a-l-f This is most likely a problem with Hunspell. It could be the the dictionary was converted to UTF-8 in order to handle the Unicode quote (U+2019).

R-a-l-f commented 5 years ago

Should I report the bug against hunspell?

My apologies for the late reply, but while all settings are correct, I don't receive notifications from github anymore and there are even no mails in the provider's web-interface spam folder, let alone my MUAs' spam folders.

kevina commented 5 years ago

Possibly. Before you do try manually fixing en_US.aff change the line SET UTF-8 to SET ISO8859-1 and remove the two ICONV after which the first couple of lines of en_US.add should change form:

SET UTF-8
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ'
ICONV 1
ICONV ’ '
NOSUGGEST !

to

SET ISO8859-1
TRY esianrtolcdugmphbyfvkwzESIANRTOLCDUGMPHBYFVKWZ'
NOSUGGEST !

en_US.dic should be ASCII but to be safe you should also need to convert the dictionary back to ISO-8859-1 using:

  mv en_US.dic en_US.dic.orig
  iconv -f utf-8 -1 -t iso-8859-1 < en_US.dic.orig > en_US.dic

Here is the commit that likely caused the problem: https://github.com/en-wl/wordlist/commit/751cd1574d70fb987cdc3b69ce2e2afa8184fdf6.