chansooligans / oagdedupe

Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches
https://oagdedupe.readthedocs.io/en/latest/
MIT License
2 stars 1 forks source link

model monitoring #84

Open chansooligans opened 2 years ago

chansooligans commented 2 years ago

Model Monitoring

During the learning stage, dedupe trains a binary classification model on labelled data, then applies the model to predict whether unlabeled data are a match or not a match.

Currently, model is pretty crude and just blindly training a logistic regression or random forest out of the box. I'd like to add some monitoring to see if there is room for some tuning, model selection, cross validation, etc. In addition, it would be useful to have an application that can communicate this