Perform cluster threshold testing for the record linkage algorithm. When a new record enters the system and has a possible match to a cluster of existing records, what are the criteria for which the new record should be added to the existing cluster or added as a record with a new MPI?
Our initial plan is to investigate criteria for adding new records to existing records such that false positives and negatives are minimized, and the run time is still performant using the following. We propose testing the following configurations to start:
New record must exactly match 100% of existing records
New record must exactly match at least 75% of existing records
New record must exactly match at least 50% of existing records
New record must exactly match at least 25% of existing records
New record must fuzzy match 100% of existing records
New record must fuzzy match at least 75% of existing records
New record must fuzzy match at least 50% of existing records
New record must fuzzy match at least 25% of existing records
New record must exactly match XX% of records on YY columns and must fuzzy match XX% of records on ZZ columns
Acceptance Criteria
Appropriate threshold for adding deciding whether a new record should be added to an existing record cluster is determined for record linkage.
Why are we doing this?
Action Requested
Perform cluster threshold testing for the record linkage algorithm. When a new record enters the system and has a possible match to a cluster of existing records, what are the criteria for which the new record should be added to the existing cluster or added as a record with a new MPI?
Our initial plan is to investigate criteria for adding new records to existing records such that false positives and negatives are minimized, and the run time is still performant using the following. We propose testing the following configurations to start:
Acceptance Criteria
Appropriate threshold for adding deciding whether a new record should be added to an existing record cluster is determined for record linkage.