Open hardbyte opened 7 years ago
We could have a bit mask that shows which of the fields were present (relative to the input schema) so that we would know on subsequent processing that the match probabilities need to be interpreted differently, or return with the probability how many parts of the schema where not matched.
Say a row doesn't have data for one field:
What should we do? 1) Current approach is still creating a CLK for the record, it will either be hashing an empty string or skipping that feature meaning less bits get set which means it might not be considered a match. 2) We could drop the row and locally output a list of entities that were dropped. 3) We could throw an error and leave it up to the user
In any case I think we should decide what approach is best and document our decision in the docs.
Aha! Link: https://csiro.aha.io/features/ANONLINK-55