Open dvquy13 opened 3 years ago
RandomForest can easily overfit train data fe2
(especially with only 3000 observations), but test performance is still no better than random.
To control overfit, I lower max_depth
and increase min_samples_leaf
and min_samples_split
while increasing n_estimators
.
Result showing F1-micro improvement from 51.6% to 54.6% in holdout set.
Using LightGBM with data fe21
does not result in any difference. Maybe at this point we should shift focus on feature engineering, as clearly we're lacking useful information to feed into learning algorithms.