Add wrap-up table for ensemble methods

INRIA / scikit-learn-mooc

Machine learning in Python with scikit-learn MOOC

https://inria.github.io/scikit-learn-mooc

Creative Commons Attribution 4.0 International

1.12k stars 516 forks source link

Add wrap-up table for ensemble methods #448

Closed ArturoAmorQ closed 1 year ago

ArturoAmorQ commented 3 years ago

Adding a wrap-up table to summarize the differences and similarities between bagging vs. boosting methods (ways of training, combining and computation times) may help setting down ideas and improve the success rate of Quiz M6.3 Q1 (which is currently below 70%).

It could be at the end of the Ensemble based on boosting lectures, i.e., inside the ensemble_hist_gradient_boosting notebook.

What do you think?

lesteve commented 2 years ago

There is this table for RandomForest vs Bagging https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_random_forest.html but we don't have a table for comparing bagging and boosting.

ogrisel commented 2 years ago

I think it's a good idea.

ogrisel commented 1 year ago

We could have a similar table at the end of this notebook:

https://inria.github.io/scikit-learn-mooc/python_scripts/ensemble_hyperparameters.html

or in a new notebook (without any code) right after this one.

ArturoAmorQ commented 1 year ago

We have such table in the "Intuitions on ensemble models: boosting" slides, which were introduced in https://github.com/INRIA/scikit-learn-mooc/pull/471.

Should we still add it in a notebook?

ogrisel commented 1 year ago

We have such table in the "Intuitions on ensemble models: boosting" slides, which were introduced in #471. Should we still add it in a notebook?

Only if we expand it a bit, for instance by including extra info about the influence of important hyper-parameters, e.g.:

too many trees can cause overfitting in gradient boosting but not for random forests.
Gradient boosting requires tuning a learning rate parameter while random forests do no have such a parameter.