CDCgov / RecordLinker

The RecordLinker is a service that links records from two datasets based on a set of common attributes. The service is designed to be used in a variety of public health contexts, such as linking patient records from different sources or linking records from different public health surveillance systems.
https://cdcgov.github.io/RecordLinker/
Apache License 2.0
2 stars 0 forks source link

Update link.py and mpi.py to use Algorithm object #16

Closed ericbuckley closed 3 weeks ago

ericbuckley commented 1 month ago

Summary

The link.py and mpi.py modules expect a Dict for the algo_config parameter, update these to be an instance of the Algorithm class in models.py. We may also need to rewrite how the simple_mpi.py and simple_link.py use the algo_config, but only if those modules have been committed to main by the time this issue is worked on.

Acceptance Criteria

Dependencies

15

ericbuckley commented 1 month ago

@cbrinson-rise8 I've been doing some testing on the new schema this week, https://github.com/CDCgov/RecordLinker/issues/62. Its looking good so far, performance is ever so slightly better, we're addressing some edge cases that the old code doesn't (eg last name O'Reilly or O'Connor), and its going to be way easier to make some of the changes NBS is asking for down the road.

At this point, I'd say don't worry about making changes to recordlinker.linkage, just focus on the changes needed in recordlinker.linking. The former packages is likely to be deleted soon.