ashwin153 / autofeat

automated exploratory data analysis
0 stars 0 forks source link

Ensure model evaluation metrics are good even for skewed classification problems #231

Closed adiraju13 closed 2 weeks ago

adiraju13 commented 2 weeks ago

Right now our model evals look bad on skewed defaults data set. Let's make sure they aren't bad.

adiraju13 commented 2 weeks ago

Approach:

adiraju13 commented 2 weeks ago

https://www.v7labs.com/blog/f1-score-guide

ashwin153 commented 2 weeks ago
adiraju13 commented 2 weeks ago

I think one of our biggest problems is that accuracy will be hard to move, given how skewed the data set is. What we can do as we push the threshold higher is increase precision while keeping recall relatively the same. There is some optimal threshold for sure and it is not 0.5.

I think the biggest thing will be getting a better model too - need that end to end before playing with this more. Looks like with just application_train.csv + model we can get precision to ~10% improvement with the other metrics being flat over the baseline.