chansooligans / oagdedupe

Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches
https://oagdedupe.readthedocs.io/en/latest/
MIT License
2 stars 1 forks source link

consolidate sample and train tables #40

Closed chansooligans closed 2 years ago

chansooligans commented 2 years ago

May be able to simplify by using just single table for both.

Originally, sample table was used for reduction ratio. Train table was used for positive and negative coverage.

Once best block conjunctions were found, sample would be used to generate active learning samples. Once labelled, these would be logged in trains table.

But we don't need both, we can use just one table for both