J535D165 / recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python
http://recordlinkage.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
966 stars 152 forks source link

missing values #162

Open yishaistreamline opened 2 years ago

yishaistreamline commented 2 years ago

it is a little bit frustrating

because I cannot find in the documentation for record linkage any explicit way to solve this

though seemingly it would be a very commonplace problem

i.e. missing values (empty string, NaN, etc) being marked as if they are positive matches when in reality they should be given a score of zero.

yishaistreamline commented 2 years ago

thank you @perryvais and @imad3v for your great work!

yishaistreamline commented 2 years ago

and @tknuth @mayerantoine also!!

nc82-nc commented 2 years ago

Hello, I have the same issue : NaN comparison generates scores equal to one. Thanks for this toolkit!

devmcp commented 2 years ago

Isn't this done with the missing_value argument? E.g.

compare_cl.numeric("a_name", "b_name", label="name", missing_value=1)