ThrunGroup / FastForest

7 stars 0 forks source link

Models are not able to fit the flight delay dataset #232

Closed motiwari closed 2 years ago

vxbrandon commented 2 years ago

Is the accuracy low??

motiwari commented 2 years ago

@vxbrandon yes, about 80%, which is just the same accuracy as predicting the modal class 0

vxbrandon commented 2 years ago

image Even lightgbm get that accuracy with max depth greater than 10!

motiwari commented 2 years ago

@vxbrandon I don't think that screenshot makes sense. Is it the same data? If so, that screenshot implies that LightGBM is learning the wrong thing and doing worse than just predicting the modal class, 0, for everything

motiwari commented 2 years ago

I discussed with Jey; this is happening because at each leaf node, 0 is still the majority label, so we always predict 0. The model is learning, it's just underfitting.

To get better accuracy, we need to set max_leaf_nodes=100 and max_depth=1000000