add support for normalizing diacritics, accents, and ligatures

jove4015 commented 2 years ago

We just implemented your fuzzy sort into the search function in our grid component and it is working great - our search accuracy and performance has improved substantially.

Along the way, we were asked to include functionality to normalize accents and diacritics:

ü should match u (ie, "uber" should match the string "Über") é should match e Å should match a ﬁ should match fi

And so on. We found that the best way to introduce this was to modify this library, and so I'm submitting it now to be included.

Normalization is triggered by a configuration option "normalizeDiacritics" which defaults to false - so any existing implementation should continue to work exactly as before. If this new option is specified, additional parsing occurs in the prepareLowerInfo function. Every time this option is changed, the built-in cache is cleared so that unnormalized results don't contaminate normalized searches and vice versa.

I didn't spend a lot of time on the performance impacts of this - obviously normalizing all the strings comes with a cost, but for our use case this is pretty negligible. I hope that by making this optional and not default, any performance impact is minimized to just those who want to use the functionality.

Thanks for your consideration! Please let me know if there's anything you'd like for me to update.

TackJordy commented 2 months ago

@farzher Is it possible that accents could be added again but as an option?

jove4015 commented 2 months ago

I've been using this for a while just off of my PR branch and when your comment came in, I realized that it was a little outdated, so I just rebased this off of v3.0.2. I'm not sure anyone will ever acknowledge the PR but just in case it's helpful to you.

Flamenate commented 2 months ago

it is helpful, thanks a lot! hoping @farzher will take a look and maybe merge it.

mkusz commented 4 weeks ago

@farzher Is there any chance to merge it? It would be very helpful for all non-pure English users that have some wired letters in their language?

jove4015 commented 3 weeks ago

@farzher Having read your comments on other people's issues, I'm pretty sure you're not merging my PR because you don't like it. If you could just tell me what you don't like about it, what you would like to see changed, any feedback at all - I would be happy to work on it. The fact there are so many people commenting on this seems to indicate to me you are indeed missing a feature that people want.