Open johnbumgarner opened 3 years ago
Excellent question.
I think Re: Updating dictionaries gives a few hints. I have started looking into https://github.com/GNUAspell/aspell-lang, it explains how to generate dictionaries that can eventually be uploaded to ftp.gnu.org:
**********************************************************************
Requirements in order to be upload to ftp.gnu.org
**********************************************************************
The number one requiment is that the dictionary package MUST be made
using "make dist" using the "proc" script as previously desribed.
This will check for a large number of things.
However, this technical documentation does not explain who or which team is currently in charge of running these tools to maintain the dictionaries for each language. You need to search the aspell mailing lists to find these well-hidden teams or individuals.
For English, these might be the web sites you're after:
The first one claims that “This word list is considered both complete and accurate” and points to SCOWL (and friends). The git repository for SCOWL (and friends) is:
The strange thing is that all of these words can actually be found in SCOWL (and friends). Make sure you have the most recent dictionaries installed, just in case. I would be interested in your findings, as I have similar issues myself, for example with donut
:
>>> import enchant
>>>
>>> words = ["donut", "donuts"]
>>> dictionary = enchant.Dict("en_US")
>>>
>>> for word in words:
... dictionary.check(word)
...
False
True
>>>
And some trivia:
You may be using the default dictionary size, which is 60 on a scale from 10 to 90. From the aspell man page:
size
(string) The preferred size of the word list. This consists of a two char digit code describing the size of the list, with typical values of: 10=tiny, 20=really small, 30=small, 40=med-small, 50=med, 60=med-large, 70=large, 80=huge, 90=insane.
Have you tried a different size, 80 or even 90 for the kind of uncommon words? Chances are you need to choose proper aspell options, not fix an actual bug.
It's not the size of the dictionary after all. The case of donut
is interesting: issue https://github.com/en-wl/wordlist/issues/310 gives a glimpse of how words are handled:
I'm exploring using the Python package pyenchant in my open source project. Since I'm developing on a Mac the backend of
pyenchant
isaspell
. During testing I noted that some English words are not found, so I'm trying to understand the limitations ofaspell
.The code below has 6 English words. It seems that 3 of these words don't exist in
aspell
dictionaries.aspell version info:
Thanks in advance for any assistance.