chansooligans / oagdedupe

Developed for Use by NY Office of the Attorney General: A Python library for scalable entity resolution, using active learning to learn blocking configurations, generate comparison pairs, then clasify matches
https://oagdedupe.readthedocs.io/en/latest/
MIT License
2 stars 1 forks source link

label ordering with predictions #14

Closed chansooligans closed 2 years ago

chansooligans commented 2 years ago

make sure order of labels is consistent between .predict() and .predict_proba() and that probabilities are always probability of "Yes"

chansooligans commented 2 years ago

upon initialization with no learned samples, the class assignment is random, so model has no way to know that yes=1 and no=0, until labells are submitted

once labels are submitted, classes are automatically sorted, so No,Yes corresponds to 0,1; so probabilities will be p(Yes); no further edit required