Closed ArturoAmorQ closed 1 year ago
There is this table for RandomForest vs Bagging https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_random_forest.html but we don't have a table for comparing bagging and boosting.
I think it's a good idea.
We could have a similar table at the end of this notebook:
https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_hyperparameters.html
or in a new notebook (without any code) right after this one.
We have such table in the "Intuitions on ensemble models: boosting" slides, which were introduced in https://github.com/INRIA/scikit-learn-mooc/pull/471.
Should we still add it in a notebook?
We have such table in the "Intuitions on ensemble models: boosting" slides, which were introduced in #471. Should we still add it in a notebook?
Only if we expand it a bit, for instance by including extra info about the influence of important hyper-parameters, e.g.:
too many trees can cause overfitting in gradient boosting but not for random forests.
Gradient boosting requires tuning a learning rate parameter while random forests do no have such a parameter.
Adding a wrap-up table to summarize the differences and similarities between bagging vs. boosting methods (ways of training, combining and computation times) may help setting down ideas and improve the success rate of Quiz M6.3 Q1 (which is currently below 70%).
It could be at the end of the Ensemble based on boosting lectures, i.e., inside the ensemble_hist_gradient_boosting notebook.
What do you think?