chansooligans / oagdedupe

Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches
https://oagdedupe.readthedocs.io/en/latest/
MIT License
2 stars 1 forks source link

124 simple version of business logic #127

Closed NYSAG-GS closed 1 year ago

NYSAG-GS commented 1 year ago

Heavily inspired by the Architecture Patterns with Python book, I wondered what the domain layer might look like if it were completely unencumbered by database implementation logic? The oagdedupe/simple module is an attempt at that.

The most basic concepts are

And some abstractions like

The top-level API has a get_entities method that finds the best conjunctions (not implemented), gets pairs, classifies pairs, and clusters records based on those classifications.

There's a lot here that isn't implemented yet, but basic tests using fakes pass. Interested in your thoughts @chansooligans, I have some thoughts on how this could be useful besides an interesting toy model.

NYSAG-GS commented 1 year ago

Merging as the changes are purely additive (I just did a rebase from master, I hope this doesn't mess anything up but will keep the commit history around if it does)