Closed adiraju13 closed 2 weeks ago
Approach:
I think one of our biggest problems is that accuracy will be hard to move, given how skewed the data set is. What we can do as we push the threshold higher is increase precision while keeping recall relatively the same. There is some optimal threshold for sure and it is not 0.5.
I think the biggest thing will be getting a better model too - need that end to end before playing with this more. Looks like with just application_train.csv + model we can get precision to ~10% improvement with the other metrics being flat over the baseline.
Right now our model evals look bad on skewed defaults data set. Let's make sure they aren't bad.