CDCgov / phdi

https://cdcgov.github.io/dibbs-site/
Creative Commons Zero v1.0 Universal
34 stars 14 forks source link

Cluster threshold testing #203

Open emmastephenson opened 1 year ago

emmastephenson commented 1 year ago

Why are we doing this?

Action Requested

Perform cluster threshold testing for the record linkage algorithm. When a new record enters the system and has a possible match to a cluster of existing records, what are the criteria for which the new record should be added to the existing cluster or added as a record with a new MPI?

Our initial plan is to investigate criteria for adding new records to existing records such that false positives and negatives are minimized, and the run time is still performant using the following. We propose testing the following configurations to start:

Acceptance Criteria

Appropriate threshold for adding deciding whether a new record should be added to an existing record cluster is determined for record linkage.

emmastephenson commented 1 year ago

@m-goggins and @bamader if you could add details to this ticket that would be appreciated!