aarondandy / WeCantSpell.Hunspell

A port of Hunspell v1 for .NET and .NET Standard
https://www.nuget.org/packages/WeCantSpell.Hunspell/
Other
126 stars 19 forks source link

Inconsistent Suggestions in WeCantSpell.Hunspell #91

Open hetavi-chaudhary opened 1 month ago

hetavi-chaudhary commented 1 month ago

We are utilizing the Suggest method from the WeCantSpell.Hunspell library to generate spelling suggestions based on specific input strings we provide. We have integrated this library into two distinct applications, ensuring that both are supplied with identical input. However, we have observed inconsistencies in the suggestions returned by each application. For instance, when we provide the input "100", one application returns "P100" as a suggestion, while the other returns "A1000". Is it normal for the Suggest method to yield different results when called with the same input in different environments or applications?

Here is the code snippet demonstrating our implementation:

var inputstringList = new List<string> { "A100", "P100", "A100 Truck", "D100 Series" };
var dictionary = WordList.CreateFromWords(inputstringList);
var suggestions = dictionary.Suggest("100");
aarondandy commented 1 month ago

That is definitely interesting. It seems stable for multiple calls to Suggest on the same dictionary so the randomness is occurring during word list construction. I think that should definitely be stable.

aarondandy commented 1 month ago

It's probably worth investigating #88 along with this one when I get around to it.

aarondandy commented 1 month ago

While the results are inconsistent, I don't think the inconsistent results here are related to #93 . Further testing has still convinced me that the apparent randomness is due to the way the word list is constructed and is not related to timing. It would be a ton of work, but I think ultimately the only way to get consistently predictable results is to move away from using a hashmap as the core of the dictionary and instead use a different data structure. This would be a lot of work, but I think that is the only way.