CDCgov / RecordLinker

The RecordLinker is a service that links records from two datasets based on a set of common attributes. The service is designed to be used in a variety of public health contexts, such as linking patient records from different sources or linking records from different public health surveillance systems.
https://cdcgov.github.io/RecordLinker/
Apache License 2.0
2 stars 0 forks source link

Universal set of blocking keys #26

Open ericbuckley opened 1 month ago

ericbuckley commented 1 month ago

Summary

As we move towards a concept of only have N number of blocking keys available to customers, we need to determine exactly what those keys should be.

Acceptance Criteria

Details / Tasks

We have 7 keys currently in use by the DIBBS_* algorithms, so at the very least we'll need these 7 going forward. There are likely a few more, which may provide good value for customers to optionally "block" with when retrieving from the database.

Background / Context

Having a universal set of blocking keys, rather than unlimited, has many advantages. a) its easier to drive a UI for users to configure their algorithm b) we can optimize the database up front for the appropriate blocking key indexes c) we can limit customers to keys we know are useful and not allow them to make too many inappropriate decisions.

ericbuckley commented 1 week ago

@alhayward now that we're in the process of adding more identifiers to the list of feature comparisons, I'm wondering if it makes sense to also block on those? For example, I could see similar reason for wanting to block on the last 4 of SSN if blocking on the last 4 of MRN provides value. In a PHA where SSN is more readily available than MRN, this might be a better blocking key. Thoughts?