The RecordLinker is a service that links records from two datasets based on a set of common attributes. The service is designed to be used in a variety of public health contexts, such as linking patient records from different sources or linking records from different public health surveillance systems.
Implementing the MPI functions, get_block_data and insert_matched_patient, using the new schema.
Related Issues
closes #10
Additional Notes
A couple of large changes to the DB schema
The BlockingKey table has been replaced with an enum. With the direction that we're going, we'll want to lock these in place. Using a DB to store them is fine, but I think that opens up the possibility that a user could add another one (say directly through some SQL query and skip the app), that would throw off all are existing BlockingValues. I think it's safer to store this directly in code, to protect our users from potentially making this mistake. Additionally, each one of these values needs a way to transform a collection of Patient PII into a list of Blocking Values specific to a key, so some code was going to be necessary no matter what.
The ExternalPerson table was deleted. With the current design some fidelity is being lost here that doesn't seem ideal. Yes, we are storing all the external person identifiers received by the documents, but once a Person cluster grows beyond 2 Patients, we have no way to trace an external_person_id back to the Patient it was originally attached to. By moving this field over to the Patient table, we retain the ability to track document external_person_id with patient_id.
Patient.external_patient_id has been added. In the current schema, Patient.id is either internal or external, just depends on if a Patient resource id is present in the FHIR bundle. If want to continue to track that external patient resource id that is sometimes present, we need a column in the patient table to store that info.
The python-dotenv dependency has been removed, settings.db_uri now has the information that we used dotenv to load, so we don't need this anymore
Checklist
Please review and complete the following checklist before submitting your pull request:
[x] I have ensured that the pull request is of a manageable size, allowing it to be reviewed within a single session.
[x] I have reviewed my changes to ensure they are clear, concise, and well-documented.
[x] I have updated the documentation, if applicable.
[x] I have added or updated test cases to cover my changes, if applicable.
[x] I have minimized the number of reviewers to include only those essential for the review.
[x] I have notified teammates in the review thread to build awareness.
Checklist for Reviewers
Please review and complete the following checklist during the review process:
[ ] The code follows best practices and conventions.
[ ] The changes implement the desired functionality or fix the reported issue.
[ ] The tests cover the new changes and pass successfully.
[ ] Any potential edge cases or error scenarios have been considered.
Description
Implementing the MPI functions,
get_block_data
andinsert_matched_patient
, using the new schema.Related Issues
closes #10
Additional Notes
settings.db_uri
now has the information that we used dotenv to load, so we don't need this anymoreChecklist
Please review and complete the following checklist before submitting your pull request:
Checklist for Reviewers
Please review and complete the following checklist during the review process: