-
Ce serait intéressant d'ajouter un exemple d'utilisation dans la documentation (README.md) :
- un exemple d'index utilisant l'analyzer
- un exemple d'index utilisant le filtre
- un exemple d'index …
-
Helping users correctly spell information would help them search faster (no typos) and helps avoid typos from going into the database.
-
OOV is our second big enemy. In best case, it makes context scorer harmlessly useless. Is being useless good?
Soundex and Double Metaphone matching methods are against OOV but they provide quite low …
-
Hi. First off, thanks for creating this script. It's fantastic.
How would you feel about augmenting the string matching to perform a fuzzy search? Would you consider merging such a feature upstream?
-
Often there are very specific names or brands that should be mentioned correctly in a transcript. Right now, there is no nice way to do this. Maybe we could add this in the future? I'd like to know id…
-
I am playing with mismo to deduplicate postal addresses in a set of about 10k entries.
After the expectation-maximization step, the odds of half of the record pairs are equal to `10_000_000_000`, hen…
-
Hello!
How do you build your interaction files? Could this be automated in any way, so it would be always up to date with the most recent MCU characters?
Regards,
Caio
-
Great to see a library like this. I would love to see the Big-O performance of each fuzzy algorithm displayed so I know what size of data I can it for and maybe some advice about pros and cons.
I'm…
-
# Summary
As part of the process-records:process-match-and-merge process there should not be notifications (based on a lower score threshold than the one defined for merging) for records that are mer…
-
Currently, any given vendor (corporation) could be listed in the vendor data under a series of similar, but not identical names. This may be due to inconsistent naming, or data entry, or due to corpor…