dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.98k stars 1.88k forks source link

Binary classification : Tune the model/algorithms to give higher positive recall instead of negative recall #5999

Open AbhayGaur opened 2 years ago

AbhayGaur commented 2 years ago

I have a binary classification problem on my hand and on using various models like LBFGS logistic regression, LightGBM, FastTree etc I am getting accuracy above 85% but a positive recall close to 0.7 with a precision of 99% with a negative recall greater than 0.99 with 80% precision.

Training Dataset consists of roughly 38% positive class and 62% negative class with 200k rows. Test dataset also has 38% positive class and 62% negative class.

Ideally, I would like the model to give me decent accuracy but with higher positive recall. I would like to have some method through which I can experiment a bit with accuracy and positive recall.

torronen commented 2 years ago

At least, if you run an AutoML experiment, you can set the optimization metric as PositivePrecision. Problem might be that it will take the highest precision no matter the recall (worst case could be 100% precision, 1% recall). I ended up adding creating a custom metric in a custom build, in my case I calculate it in TrainValidateRunner.Run To save some time, I think you could just overwrite one of the old optimization metric value there.

Also one problem could be that after the AutoML experiment ends, there might not be a way to see the hypereparameters AutoML chose for you to continue manual search. I recently added a ticket about it.

torronen commented 2 years ago

One random thought: Since it binary maybe you could try manually classifying based on score. Now it probably negatives false, positives true. Would it work id you would condisder some small negative values as true? I.e. move uncertain falses to true. I have no idea if it would work, just a thought to test.

AbhayGaur commented 2 years ago

Yeah, I have been able to manually classify based on score for now and it works to an extent but an easier way to tune model performance would have been ideal