aarondandy / WeCantSpell.Hunspell

A port of Hunspell v1 for .NET and .NET Standard
https://www.nuget.org/packages/WeCantSpell.Hunspell/
Other
126 stars 19 forks source link

Seems to skip some English words #90

Open niksedk opened 1 month ago

niksedk commented 1 month ago

When I load the en_US.dic/en_US.aff files, the method wordList.Check("hours"); returns false. The same for e.g. years

I have loaded all encodings and there are no affix warnings.

Any ideas?

en_us.zip

aarondandy commented 7 hours ago

I'm not sure where you got this dictionary from, but I can reproduce your results. Reading the dictionary file though, it looks like it only has the 's suffix and the associated dictionary entry hour/1 1.

Looking at other better dictionary files which work for "hours", I can see many prefixes and suffixes defined on that root.

Let me know if the behavior of this library deviates from origin Hunspell but I think this is correct behavior for the provided dictionary file. I recommend finding a better dictionary file with more usage.

niksedk commented 6 hours ago

The above dictionary works fine in the original hunspell + NHunspell (by Thomas Meierhofer) + HunspellSharp, so I really think this is a serious bug in WeCantSpell.

Do you have an English dictionary that you recommend?

aarondandy commented 3 hours ago

I was able to reproduce this in NHunspell, but considering it's age I don't place a lot of weight on that. I was able to get a reproduction against the origin hunspell importantly, so it does appear to be a deviation in behavior.

Regarding dictionaries, the following works for "hours", the English (American) dictionary in this repository works for me: https://github.com/titoBouzout/Dictionaries/

aarondandy commented 1 hour ago

The specific aspect of these files causing the issue are the comments at the end of some of the affix file commands. I need to add some code to strip those comments off before processing the line.