crflynn / skranger

scikit-learn compatible Python bindings for ranger C++ random forest library
https://skranger.readthedocs.io/en/stable/
GNU General Public License v3.0
51 stars 7 forks source link
machine-learning random-forest scikit-learn

skranger

|build| |wheels| |rtd| |pypi| |pyversions|

.. |build| image:: https://github.com/crflynn/skranger/actions/workflows/build_and_test.yml/badge.svg :target: https://github.com/crflynn/skranger/actions

.. |wheels| image:: https://github.com/crflynn/skranger/actions/workflows/release.yml/badge.svg :target: https://github.com/crflynn/skranger/actions

.. |rtd| image:: https://img.shields.io/readthedocs/skranger.svg :target: http://skranger.readthedocs.io/en/latest/

.. |pypi| image:: https://img.shields.io/pypi/v/skranger.svg :target: https://pypi.python.org/pypi/skranger

.. |pyversions| image:: https://img.shields.io/pypi/pyversions/skranger.svg :target: https://pypi.python.org/pypi/skranger

skranger provides scikit-learn <https://scikit-learn.org/stable/index.html> compatible Python bindings to the C++ random forest implementation, ranger <https://github.com/imbs-hl/ranger>, using Cython <https://cython.readthedocs.io/en/latest/>__.

The latest release of skranger uses version 0.12.1 <https://github.com/imbs-hl/ranger/releases/tag/0.12.1>__ of ranger.

Installation

skranger is available on pypi <https://pypi.org/project/skranger>__ and can be installed via pip:

.. code-block:: bash

pip install skranger

Usage

There are two sklearn compatible classes, RangerForestClassifier and RangerForestRegressor. There is also the RangerForestSurvival class, which aims to be compatible with the scikit-survival <https://github.com/sebp/scikit-survival>__ API.

RangerForestClassifier


The ``RangerForestClassifier`` predictor uses ``ranger``'s ForestProbability class to enable both ``predict`` and ``predict_proba`` methods.

.. code-block:: python

    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    from skranger.ensemble import RangerForestClassifier

    X, y = load_iris(return_X_y=True)
    X_train, X_test, y_train, y_test = train_test_split(X, y)

    rfc = RangerForestClassifier()
    rfc.fit(X_train, y_train)

    predictions = rfc.predict(X_test)
    print(predictions)
    # [1 2 0 0 0 0 1 2 1 1 2 2 2 1 1 0 1 1 0 1 1 1 0 2 1 0 0 1 2 2 0 1 2 2 0 2 0 0]

    probabilities = rfc.predict_proba(X_test)
    print(probabilities)
    # [[0.01333333 0.98666667 0.        ]
    #  [0.         0.         1.        ]
    #  ...
    #  [0.98746032 0.01253968 0.        ]
    #  [0.99       0.01       0.        ]]

RangerForestRegressor

The RangerForestRegressor predictor uses ranger's ForestRegression class. It also supports quantile regression using the predict_quantiles method.

.. code-block:: python

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from skranger.ensemble import RangerForestRegressor

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

rfr = RangerForestRegressor()
rfr.fit(X_train, y_train)

predictions = rfr.predict(X_test)
print(predictions)
# [26.27401667  8.96549989 24.82981667 27.92506667 28.04606667 45.4693
#  21.89681787 40.30345    11.53959613 19.13675    15.88567273 16.69713567
#  ...
#  20.29025364 26.21245833 23.79643333 14.03546362 21.24893333 34.8825
#  21.22463333]

# enable quantile regression on instantiation
rfr = RangerForestRegressor(quantiles=True)
rfr.fit(X_train, y_train)

quantile_lower = rfr.predict_quantiles(X_test, quantiles=[0.1])
print(quantile_lower)
# [22.    5.   21.88 23.08 23.1  35.89 10.85 31.5   7.04 14.5  11.7  10.9
#   8.1  28.38  7.2  19.6  29.1  13.1  24.94 21.09 15.6  11.7  10.41 14.5
#  ...
#  18.9  21.4   9.43  8.7  26.46 18.99  7.2  19.27 18.5  21.19 18.99 18.88
#  14.07 21.87 22.18  9.43 17.28 29.6  18.2 ]
quantile_upper = rfr.predict_quantiles(X_test, quantiles=[0.9])
print(quantile_upper)
# [30.83 12.85 29.01 33.1  33.1  50.   29.75 50.   15.   23.   19.96 21.4
#  20.53 50.   13.35 25.   48.5  19.6  46.   26.6  23.7  20.1  17.8  21.4
#  ...
#  26.78 28.1  17.86 27.5  46.25 24.4  16.74 24.4  28.7  29.1  24.4  25.
#  25.   31.51 28.   20.8  26.7  42.13 24.24]

RangerForestSurvival



The ``RangerForestSurvival`` predictor uses ``ranger``'s ForestSurvival class, and has an interface similar to the RandomSurvivalForest found in the ``scikit-survival`` package.

.. code-block:: python

    from sksurv.datasets import load_veterans_lung_cancer
    from sklearn.model_selection import train_test_split
    from skranger.ensemble import RangerForestSurvival

    X, y = load_veterans_lung_cancer()
    # select the numeric columns as features
    X = X[["Age_in_years", "Karnofsky_score", "Months_from_Diagnosis"]]
    X_train, X_test, y_train, y_test = train_test_split(X, y)

    rfs = RangerForestSurvival()
    rfs.fit(X_train, y_train)

    predictions = rfs.predict(X_test)
    print(predictions)
    # [107.99634921  47.41235714  88.39933333  91.23566667  61.82104762
    #   61.15052381  90.29888492  47.88706349  21.25111508  85.5768254
    #   ...
    #   56.85498016  53.98227381  48.88464683  95.58649206  48.9142619
    #   57.68516667  71.96549206 101.79123016  58.95402381  98.36299206]

    chf = rfs.predict_cumulative_hazard_function(X_test)
    print(chf)
    # [[0.04233333 0.0605     0.24305556 ... 1.6216627  1.6216627  1.6216627 ]
    #  [0.00583333 0.00583333 0.00583333 ... 1.55410714 1.56410714 1.58410714]
    #  ...
    #  [0.12933333 0.14766667 0.14766667 ... 1.64342857 1.64342857 1.65342857]
    #  [0.00983333 0.0112619  0.04815079 ... 1.79304365 1.79304365 1.79304365]]

    survival = rfs.predict_survival_function(X_test)
    print(survival)
    # [[0.95855021 0.94129377 0.78422794 ... 0.19756993 0.19756993 0.19756993]
    #  [0.99418365 0.99418365 0.99418365 ... 0.21137803 0.20927478 0.20513086]
    #  ...
    #  [0.87868102 0.86271864 0.86271864 ... 0.19331611 0.19331611 0.19139258]
    #  [0.99021486 0.98880127 0.95299007 ... 0.16645277 0.16645277 0.16645277]]

License
-------

``skranger`` is licensed under `GPLv3 <https://github.com/crflynn/skranger/blob/master/LICENSE.txt>`__.

Development
-----------

To develop locally, it is recommended to have ``asdf``, ``make`` and a C++ compiler already installed. After cloning, run ``make setup``. This will setup the ranger submodule, install python and poetry from ``.tool-versions``, install dependencies using poetry, copy the ranger source code into skranger, and then build and install skranger in the local virtualenv.

To format code, run ``make fmt``. This will run isort and black against the .py files.

To run tests and inspect coverage, run ``make test``.

To rebuild in place after making changes, run ``make build``.

To create python package artifacts, run ``make dist``.

To build and view documentation, run ``make docs``.