Ensemble Learning: introduction, methods and applications

anjalibhavan commented 6 years ago

Abstract

This talk covers ensemble learning, a paradigm of machine learning. Ensemble learning consists of combining multiple learning models for predictions, and is a popular method for predictions and analysis. I will go through the various kinds of ensemble methods and applications in various real-world problems, and also share tips and resources for building and deploying good ensembles.

About

Introduction: the advent of machine learning and data analytics as the future in making, why they contain much potential and promise to look forward to. Solve problems from cancer to movie recommendations and planet detection.
Ensemble Learning: Failure/inefficiency/excess time and/or memory consumption of several good machine learning algorithms needs a solution. Ensemble learning, the concept and origins. Basic introduction, why it can be better than several other prevalent methods. Motive behind building ensembles. Introduction to hyperparameters, and why models with too many hyperparameters to tune take too much time and a better method is instead needed.
Types of ensemble learning methods: many classes and categories, but they all talk about the same basic types. We'll stick to the ones specified in scikit-learn and commonly used terminology. -Bagging: Short for bootstrap aggregating. Introduction, idea: training several similar base estimators on different subsets of data and aggregating their results for final predictions. Methods of drawing random samples: Random Subspace, Pasting, Random Patches. Examples (available in scikit-learn): Random Forest and Extra Trees Classifiers/Regressors. Introduction, analysis of working principles, randomness of splits, applications, reducing variance and increasing bias, variations and unsupervised learning. Examples, detailed explanation. -Boosting: Introduction and motivation: converting a bunch of weak learners to strong learners by iteratively learning from previous mistakes. Concept of weak learners: estimators that perform only slightly better than random estimators. Examples of weak learners: Decision Stumps. AdaBoost, the most important and useful boosting algorithm: introduction, origins, authors. Detailed explanation and analysis of algorithm, variants and applications. Gradient Boosting, a generalization of AdaBoost for multi-class classification, regression etc., available as Gradient Tree Boosting in scikit-learn: introduction, explanation, applications. Variants of boosting algorithms, a brief introduction: LogitBoost, BrownBoost, CatBoost. XGBoost: the secret recipe behind several Kaggle data science competitions, what makes it superior to other ensemble learning methods. Tips for using ensemble algorithms, resources and ideas. -Stacking/blending: Introduction, motivation: using several base estimators to make predictions, then using these predictions in a second estimator called meta-estimator for final predictions. Stacking classifiers and regressors together. Difference between stacking and blending, advantages and disadvantages of blending over stacking. Cross validation in the case of stacking and blending. Number one method for top Kaggle competitions and other data science problems including the Netflix Prize. Tips and resources for ensembles, why having a lot of diversity in estimators is better than identical estimators stacked up/blended. -Voting ensembles: Introduction and motivation: giving more weightage to better-performing base estimators and less to poorly performing ones, or assigning final predictions on majority rule. Two kinds: hard voting (majority-based) and soft voting (averages of predicted probabilities). Usage and applications, the Otto Group Product Classification Challenge. Conclusion: more and more active research required, ensemble learning the future of machine learning and data science research.

Pre-requisites

Basic knowledge of data science and machine learning using Python.

Expected duration

20-25 minutes.

Level

Intermediate

Resources

Speaker Bio

I am a third year engineering undergrad at Delhi Technological University (DTU). I have done several projects in the field of Machine Learning, and authored a research paper as well, on the task of human activity recognition from accelerometer data. I am currently working on speaker-independent speech emotion recognition, and other projects. I am passionate about math, artificial intelligence and literature, and believe that there is much potential to be harnessed still in Machine Learning.

- Can be done after the talk/workshop -

Include link to slides here

Include link to video here

utkarsh2102 commented 6 years ago

Hi, Are you available for a talk on the 8th of September?

im-gozmit commented 6 years ago

Hi @anjalibhavan , Are you available for the talk on the 23rd of September?

anjalibhavan commented 6 years ago

Hi @anjalibhavan , Are you available for the talk on the 23rd of September?

Hi! I won't be, unfortunately, I have exams around that time. I'll be available October onwards; will post a new talk issue.

cocoa1231 commented 5 years ago

Hey @anjalibhavan Would you be available to give this talk on 14th April?

PyLadiesDelhi / talks

Ensemble Learning: introduction, methods and applications #13

- Can be done after the talk/workshop -