EvgeniDubov / hellinger-distance-criterion

Random Forest model using Hellinger Distance as split criterion
BSD 3-Clause "New" or "Revised" License
31 stars 12 forks source link

Hellinger Dinstance Criterion

Hellinger Distance criterion for sklearn Random Forest and Decision Tree classifiers

I'm working on adding this to scikit-learn-contrib/imbalanced-learn PR #437

Build

You will need a cython "header" file (.pxd) from sklearn.

In case you've installed sklearn from source code package, you've already got it.

In case you've installed sklearn using pip install sklearn then you need to get it.

python setup.py build_ext --inplace

Example


>>> import numpy as np
>>> from hellinger_distance_criterion import HellingerDistanceCriterion
>>> from sklearn.ensemble import RandomForestClassifier
>>>
>>> hdc = HellingerDistanceCriterion(1, np.array([2],dtype='int64'))
>>> clf = RandomForestClassifier(criterion=hdc, max_depth=4, n_estimators=100)
>>> clf.fit(X_train, y_train)
>>> print('hellinger distance score: ', clf.score(X_test, y_test))