cheetahbright / tsa-decision-trees

Decision tree implementation on a data set from the Transporation Security Administration.
0 stars 0 forks source link

Use Gradient Boosting Model #2

Open malctaylor15 opened 6 years ago

malctaylor15 commented 6 years ago

Use a gradient boosting model with grid search to explore the best possible model.

Sklearn implementation of GBM http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html

Can also use xgboost Hyper parameter tuning in xgboost https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/

Grid Search CV in sklearn http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

Extra: Create custom grid search to avoid CV aspect of sklearn gridsearchcv

Goals:

  1. Find the optimal gbm hyperparameters for a model that does not over fit but has optimal R2 performance on full dataset

  2. Defend model choice with results of other hyper parameters

  3. Pickle optimal model for later use

cheetahbright commented 6 years ago

@maltaylor15 Is "R2" r-squared?

malctaylor15 commented 6 years ago

@trackoverxc Yeah I meant r-square As an aside, we can choose the metric for model evaluation (RMSE, MAPE, R2, etc)