New algo VotingClassifier

seignour commented 7 years ago

This algo tests various weighted combinations of SGD, PCA-SGD and Adaboost. The best estimator is a 50-50 mixture of PCA-SGD and Adaboost, which achieves a 0.2-0.3% AUROC gain over PCA-SGD alone. This is a small gain in performance, and the computational time is very slow. However, it does show that a mixture of two good classifiers can at times result in a better classifier, and could have applications to other situations.

Note: due to the very slow computation time, I chose to dump the results for each model, rather than use the automated scikit grid search. This way, if the algo fails in the middle, I can refit only a subset of the models. The models are then reloaded and analyzed in a separate, independent section.

dhimmel commented 7 years ago

Cool thanks for implementing the ensemble VotingClassifier.

The best estimator is a 50-50 mixture of PCA-SGD and Adaboost, which achieves a 0.2-0.3% AUROC gain over PCA-SGD alone. This is a small gain in performance, and the computational time is very slow. However, it does show that a mixture of two good classifiers can at times result in a better classifier, and could have applications to other situations.

It's difficult to assess whether a 0.3% change in testing AUROC is meaningful here, as there is of course some random variation in this measure.

seignour commented 7 years ago

Thanks for approving, I agree that noise is a concern there.

cognoma / machine-learning

New algo VotingClassifier #79