crflynn / skranger

scikit-learn compatible Python bindings for ranger C++ random forest library
https://skranger.readthedocs.io/en/stable/
GNU General Public License v3.0
52 stars 7 forks source link

Call `predict_quantile` by default if `quantiles` is set to `True`, `list` or `np.ndarray` #133

Open r3v1 opened 2 years ago

r3v1 commented 2 years ago

When using RangerForestRegressor with quantiles=True in a parameter optimization software (i.e. tune-sklearn) in order to optimize probabilistic metrics like Continuous Ranked Probability Score (CRPS), it is required the model ot output the 2D tensor corresponding to the predict_quantiles method. However, when making CRPS a score metric with the sklearn API with make_score function, in a final step, it will call (always) the Ranger's predict method, so it is never going to predict quantiles in any way.

Here is a brief example of what I am trying to explain:

from sklearn.metrics import make_scorer
from skranger.ensemble import RangerForestRegressor
from tune_sklearn import TuneSearchCV
from solarforecastarbiter.metrics.probabilistic import continuous_ranked_probability_score as crps

param_dists = {
    'max_depth': (0, 50),
    'min_node_size': (10, 100),
    'n_estimators': (100, 1000),
    'split_rule': ['variance', 'extratrees', 'maxstat'],
}

m = RangerForestRegressor(quantiles=True)
gs = TuneSearchCV(m,
                  param_distributions=param_dists,
                  scoring=make_scorer(crps, greater_is_better=False),
)
gs.fit(X, y)  # Raise error: forecasts must be 2D arrays

I think the sklearn API is correct. To surpass this problem, I made some chages in skranger:

NOTE: Additional logic should be implemented if a non-quantile prediction is required and quantile mode is enabled.

crflynn commented 2 years ago

I see what you're doing and it makes sense. I'm wondering if we should just break out the quantile regression to a separate estimator. Does that make sense to do here?

FWIW R's grf does this and I followed this pattern when writing skgrf.

crflynn commented 2 years ago

Looks like builds are broken due to this bug in setuptools. Looks like a fix is in progress. https://github.com/pypa/setuptools/issues/3002

r3v1 commented 2 years ago

I'm wondering if we should just break out the quantile regression to a separate estimator. Does that make sense to do here?

Well, I wouldn't know what would be better, I think you know better the global structure of the project.