LCETreeClassifier predict_proba speedup

Problem

At the moment all predictions are done sequential. When predicting a few thousend samples it is slower than training the model itself.

Idea

The computational time can be decreased by a few orders of magnitude by predicting all samples of a node at once. And sorting the predictions to the order of the given X to predict on.

Speed comparision

I run my patch with two different datasets and used cProfile to messure the improvement. In both cases the computation time was improved by two orders of magnitude.


dataset: dmc_2003
X_train shape: (8000, 50)
n classes in y: 2
11177 samples to predict

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  240.757  240.757 *\_lce.py:430(predict) <- current version
        1    0.000    0.000  118.697  118.697 *\_lce.py:393(fit)
        1    0.000    0.000    0.885    0.885 *\_lce.py:430(predict) <- patch version

dataset: dmc_2007
X_train shape: (10000, 14)
n classes in y: 3
50000 samples to predict

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000 1070.539 1070.539 *\_lce.py:430(predict) <- current version
        1    0.000    0.000  138.662  138.662 *\_lce.py:393(fit)
        1    0.000    0.000    5.714    5.714 *\_lce.py:430(predict) <- patch version

Validation

First i checked if i had the same classification_report. But this can be misleading when running on small toydatasets. So i looked into the proba values which are returned to the bagging classifier and checked if they are the

Pull request

I made a pull request 0cd62d6

LocalCascadeEnsemble / LCE