J535D165 / recordlinkage

A powerful and modular toolkit for record linkage and duplicate detection in Python
http://recordlinkage.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
966 stars 152 forks source link

missing value is not working and it is default to 0 even if we change the value. #180

Open selva221724 opened 2 years ago

selva221724 commented 2 years ago
compareModule = rl.Compare()
compareModule.string('FullName', 'FullName', method='jarowinkler', threshold=0.95,
                     label='FullName')
compareModule.string('FullAddress', 'FullAddress', threshold=0.85, label='FullAddress',
                     method='jarowinkler')

compareModule.exact('DOB', 'DOB', label='DOB', missing_value=0.5)
compareModule.exact('Phone', 'Phone', label='Phone', missing_value=0.5)
compareModule.exact('Email', 'Email', label='Email', missing_value=0.5)

I am trying to use the missing_value, but it is returning 0 when there is a missing value in the comparison. I am using the latest version 0.15, no improvements still? any help will be appreciated !!

image

selva221724 commented 2 years ago

Found a solution !!!

find the compare.py in your python environment \Lib\site-packages\recordlinkage\compare.py and edit the following with your desired value, it is a temp fix, not long-term. I needed 0.5

replace missing_value=0.0 with missing_value=0.5

Again it is a temp fix, not the correct one please image

For the best correction, remove the default value of missing_value and declare the missing value every time when you use the string, exact..etc.