LocalCascadeEnsemble / LCE

Random Forest or XGBoost? It is Time to Explore LCE
https://lce.readthedocs.io/
Apache License 2.0
67 stars 8 forks source link

LCETreeClassifier predict_proba speedup #3

Closed Wuuzzaa closed 2 years ago

Wuuzzaa commented 2 years ago

Problem

At the moment all predictions are done sequential. When predicting a few thousend samples it is slower than training the model itself.

Idea

The computational time can be decreased by a few orders of magnitude by predicting all samples of a node at once. And sorting the predictions to the order of the given X to predict on.

Speed comparision

I run my patch with two different datasets and used cProfile to messure the improvement. In both cases the computation time was improved by two orders of magnitude.


dataset: dmc_2003
X_train shape: (8000, 50)
n classes in y: 2
11177 samples to predict

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  240.757  240.757 *\_lce.py:430(predict) <- current version
        1    0.000    0.000  118.697  118.697 *\_lce.py:393(fit)
        1    0.000    0.000    0.885    0.885 *\_lce.py:430(predict) <- patch version
dataset: dmc_2007
X_train shape: (10000, 14)
n classes in y: 3
50000 samples to predict

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000 1070.539 1070.539 *\_lce.py:430(predict) <- current version
        1    0.000    0.000  138.662  138.662 *\_lce.py:393(fit)
        1    0.000    0.000    5.714    5.714 *\_lce.py:430(predict) <- patch version

Validation

First i checked if i had the same classification_report. But this can be misleading when running on small toydatasets. So i looked into the proba values which are returned to the bagging classifier and checked if they are the

Pull request

I made a pull request 0cd62d6

LocalCascadeEnsemble commented 2 years ago

Hello,

Thank you for your suggestion and the analysis. The new version 0.2.6 contains the speedup improvement inspired by your proposition, extended to the case of an input of dimension 1 and LCERegressor.

Best,