data61 / blocklib

Python implementations of record linkage blocking techniques.
Apache License 2.0
19 stars 4 forks source link

No signatures for NULL values #259

Closed wilko77 closed 1 year ago

wilko77 commented 1 year ago

We shouldn't block on missing values, as that might lead to very large blocks. We should only block on known data.

Here we introduce a 'null-sentinel' which represent the string that stands for the NULL value. Defaults to the empty string, as that is most common in csv files.

codecov[bot] commented 1 year ago

Codecov Report

Merging #259 (d7aa61b) into main (58b7dbe) will decrease coverage by 0.14%. The diff coverage is 92.30%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #259 +/- ## ========================================== - Coverage 97.31% 97.18% -0.14% ========================================== Files 18 18 Lines 633 639 +6 ========================================== + Hits 616 621 +5 - Misses 17 18 +1 ```