cleanlab / examples

Notebooks demonstrating example applications of the cleanlab library
https://github.com/cleanlab/cleanlab
GNU Affero General Public License v3.0
109 stars 21 forks source link

Simpler or current model I should use to predict probabilities? #93

Closed Haoyoudoing closed 1 week ago

Haoyoudoing commented 1 month ago

Thanks for publishing such a great project for finding data issues. After reviewing some of the examples, I would like to hear your guidance for the following situation:

How to find human annotators' error labels during active learning to fine-tune a sentence transformer model for a text classification task. Should I use a simpler model, i.e. a logistic regression model, to generate the probabilities for confident learning, or should I use the current fine-tuned sentence transformer to do the job? Will this make a big difference?

jwmueller commented 1 month ago

I'd recommend the current fine-tuned sentence transformer