MTH594 Advanced data mining: theory and applications
The materials for the course MTH 594 Advanced data mining: theory and applications taught by Dmitry Efimov in American University of Sharjah, UAE in Spring, 2016 semester.
The program of the course can be downloaded from the folder syllabus.
To compose this lectures mainly I used the ideas from three sources:
- Stanford lectures by Andrew Ng on YouTube: https://www.youtube.com/watch?v=UzxYlbK2c7E&list=PLA89DCFA6ADACE599
- The book "The elements of Statistical Learning" by T. Hastie, R. Tibshirani and J. Friedman: http://statweb.stanford.edu/~tibs/ElemStatLearn
- Lectures by Andrew Ng on Coursera: https://www.coursera.org/learn/machine-learning
All uploaded pdf lectures are adapted in a way to help students to understand the material.
The supplementary files from ipython folder are aimed to teach students how to use built-in methods to train the models on Python 2.7.
In case you found some mistakes or typos, please email me diefimov@gmail.com, this course is a new for me and probably there are some :)
The content of the lectures:
Supervised learning
Linear and logistic regressions, perceptrons
Linear regression
Analytical minimization: normal equations
Statistical interpretation
Logistic regression
Perceptron
Bayesian interpretation and regularization
Python implementation
Linear regression
Logistic regression
Perceptron
Regularization
Methods of optimization
Gradient descent
Examples of gradient descent
Newton's method
Python implementation
Batch gradient descent
Stochastic gradient descent
Generalized linear models (GLM)
Exponential family
Generalized Linear Models (GLM)
Python implementation
Softmax regression
Generative learning algorithms
General idea of generative algorithms
Gaussians
Gaussian discriminant analysis
Generative vs Discriminant comparison
Naive Bayes
Laplace smoothing
Event models
Python implementation
Gaussians
Gaussian discriminant analysis
Naive Bayes
Neural networks
Definition
Backpropagation
Python implementation
Support vector machines
Support vector machines: intuition
Primal/dual optimization problem and KKT
SVM dual
Kernels
Kernel examples
Kernel testing
SVM with kernels
Soft margin
SMO algorithm
Python implementation
Coordinate ascent
SVM
SMO algorithm
Nonparametric methods
Locally weighted regression
Generalized additive models (GAM)
GAM for regression
GAM for classification
Tree-based methods
Regression trees
Classification trees
Boosting
Exponential loss
Adaboost
Gradient boosting
Gradient tree boosting
Python implementation
Locally weighted regression
GAM for regression
GAM for classification
Regression decision trees
Classification decision trees
Gradient tree boosting
Learning theory
Bias / variance
Empirical risk minimization (ERM)
Union bound / Hoeffding inequality
Uniform convergence
VC dimension
Model selection
Feature selection
Python implementation
Cross validation
Online learning
Advices for apply ML algorithms
Unsupervised learning
Clustering
K-means
Python implementation
Mixture of Gaussians and EM algorithm
Mixture of Gaussians
Jensen's inequality
General EM algorithm
EM algorithm for the mixture of Gaussians
EM algorithm for the mixture of Naive Bayes
Python implementation
Mixture of Gaussians
EM algorithm for mixture of Gaussians
Factor analysis
Intuition
Marginal and conditionals for Gaussians
Factor analysis model
EM steps for factor analysis
Python implementation
Principal component analysis
PCA algorithm
Latent semantic indexing (LSI)
Python implementation
Independent component analysis (ICA)