MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
725 stars 68 forks source link

Getting argmax of empty sequence #53

Closed ashczq closed 1 year ago

ashczq commented 1 year ago

image

I'm currently using the levenshtein distance through Edit Distance and getting this error. Have checked my list values for null values as well as empty strings. The same list values work with fuzz_matcher & TF-IDF. Any idea why this is happening?

MaartenGr commented 1 year ago

Apologies for the late response. Could you give me the full code and perhaps a reproducible example? Without it, it is difficult to see what exactly is happening.

ashczq commented 1 year ago
jellyfish_matcher = EditDistance(scorer=levenshtein_distance)
company_names = ['shift capital',' blackstone real estate private equity','shift capital llc','brokertec','axience']
model = PolyFuzz(jellyfish_matcher)
model.match(company_names)

image

Sure i've included an example and also a screenshot of the initial error that results in the argmax error.

MaartenGr commented 1 year ago

It took a while, my apologies, but it seems that the issue stems from the implementation of the EditDistance in PolyFuzz. I just added a fix to the main branch that you can use. I might release an official quickfix in the coming weeks but that will also depends on whether new features are going to be implemented.