Closed berndnoll closed 3 years ago
Hi @berndnoll
It doesn't look like you are doing anything wrong. Were you expecting a different result?
Ha! I was expecting a different result, but I got "tricked" by min_similarity. I was expecting a higher match score for elements 1 and 2. Once you scale it down, it turns out it's only a very low score.
matches = match_strings(accounts['name'],min_similarity=0.3)
left_index left_name similarity right_name right_index 0 0 Jim Beam 1.000000 Jim Beam 0 1 0 Jim Beam 0.309527 Jim Boom 1 2 1 Jim Boom 0.309527 Jim Beam 0 3 1 Jim Boom 1.000000 Jim Boom 1 4 2 Jack Daniels 1.000000 Jack Daniels 2 5 3 John Dummel 1.000000 John Dummel 3 6 4 Bob Bubble 1.000000 Bob Bubble 4 7 5 Seth Suckerman 1.000000 Seth Suckerman 5
Sorry for bothering you with this and thanks again for your awesome support.
Hi, I was just curious about what happens when I run this piece of code. I came across this when I split my data into smaller chunks.
Code: import pandas as pd from string_grouper import match_strings
accounts = pd.DataFrame() accounts['name'] = ['Jim Beam','Jim Boom','Jack Daniels','John Dummel','Bob Bubble','Seth Suckerman']
matches = match_strings(accounts['name']) print(matches)
Output: left_index left_name similarity right_name right_index 0 0 Jim Beam 1.0 Jim Beam 0 1 1 Jim Boom 1.0 Jim Boom 1 2 2 Jack Daniels 1.0 Jack Daniels 2 3 3 John Dummel 1.0 John Dummel 3 4 4 Bob Bubble 1.0 Bob Bubble 4 5 5 Seth Suckerman 1.0 Seth Suckerman 5
Am I doing something wrong here? I hope this is not too dumb of a question, I am new to py and pandas.
Thank you for looking into this.