chansooligans / oagdedupe

Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches
https://oagdedupe.readthedocs.io/en/latest/
MIT License
2 stars 1 forks source link

dependency inversion: clustering submodule #105

Closed chansooligans closed 2 years ago

chansooligans commented 2 years ago

oagdedupe.cluster is used after active learning is completed and model has predicted match scores. This step uses a graph to retrieve connected components and each connected component is a linked ("deduplicated") entity.

Currently the Connected Components object handles this task. It already inherits from BaseCluster, the abstraction. can close