Ensure model evaluation metrics are good even for skewed classification problems

ashwin153 / autofeat

automated exploratory data analysis

0 stars 0 forks source link

Ensure model evaluation metrics are good even for skewed classification problems #231

Closed adiraju13 closed 2 weeks ago

adiraju13 commented 2 weeks ago

Right now our model evals look bad on skewed defaults data set. Let's make sure they aren't bad.

adiraju13 commented 2 weeks ago

Approach:

Pick a model threshold, such that anything over that threshold == 1 and anything below == 0.
We will pick that threshold to maximize f1 score, which balances precision and recall. If one of the two is really low, then the f1 score is penalized, even if the other one is high.
This is especially helpful in cases where the data is skewed

adiraju13 commented 2 weeks ago

https://www.v7labs.com/blog/f1-score-guide

ashwin153 commented 2 weeks ago

adiraju13 commented 2 weeks ago

I think one of our biggest problems is that accuracy will be hard to move, given how skewed the data set is. What we can do as we push the threshold higher is increase precision while keeping recall relatively the same. There is some optimal threshold for sure and it is not 0.5.

I think the biggest thing will be getting a better model too - need that end to end before playing with this more. Looks like with just application_train.csv + model we can get precision to ~10% improvement with the other metrics being flat over the baseline.