DeNederlandscheBank / name_matching

Other
128 stars 43 forks source link

Problems when dataset to match < match database #18

Closed ChristianRLynn closed 10 months ago

ChristianRLynn commented 10 months ago

Hey! Thanks for all your hard work on this project. I have been running into an issue where if I attempt to match a small dataset (24 items) to a large master dataset (1078760 items)I get a: IndexError: index 1078760 is out of bounds for axis 0 with size 24

Not quite sure what code will help, please advise if this is an accidental issue and is reproducible. (I have gotten this error with a few different datasets for testing)

image

mnijhuis-dnb commented 10 months ago

I have managed to reproduce it as well, unfortunately it is bug a in the code. It goes wrong with updating the index after the matching. Should be able to fix this sometime next week.

In the meantime can you try and run the code with the additional argument

row_numbers=True

when constructing the NameMatcher. this should fix the issue for now

ChristianRLynn commented 10 months ago

Great, thank you for the work-around I really appreciate this little bit of code, it has been very useful in several ways. It took a little while to understand the documentation (what is the master file, what is the to-be-matched file especially) But other than that, wonderful