RGF-team / rgf

Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.
378 stars 58 forks source link

Support latest scikit-learn (>=0.24) and Python 3.9 #333

Closed StrikerRUS closed 3 years ago

StrikerRUS commented 3 years ago

Workaround changes in the latest scikit-learn version where integration test yields weight array with zeros:

__________________ TestRGFClassifier.test_sklearn_integration __________________

self = <test_rgf_python.TestRGFClassifier testMethod=test_sklearn_integration>

    def test_sklearn_integration(self):
>       check_estimator(self.estimator_class())

tests/test_rgf_python.py:26: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/github/home/miniconda/envs/test-env/lib/python3.9/site-packages/sklearn/utils/estimator_checks.py:547: in check_estimator
    check(estimator)
/github/home/miniconda/envs/test-env/lib/python3.9/site-packages/sklearn/utils/_testing.py:308: in wrapper
    return fn(*args, **kwargs)
/github/home/miniconda/envs/test-env/lib/python3.9/site-packages/sklearn/utils/estimator_checks.py:914: in check_sample_weights_invariance
    estimator2.fit(X2, y=y2, sample_weight=sw2)
/github/home/miniconda/envs/test-env/lib/python3.9/site-packages/rgf/utils.py:537: in fit
    sample_weight = self._get_sample_weight(sample_weight)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = RGFClassifier()
sample_weight = array([1., 0., 1., 1., 0., 1., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0.,
       1., 0., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 1.])

    def _get_sample_weight(self, sample_weight):
        if sample_weight is not None:
            sample_weight = column_or_1d(sample_weight, warn=True)
            if (sample_weight <= 0).any():
>               raise ValueError("Sample weights must be positive.")
E               ValueError: Sample weights must be positive.

/github/home/miniconda/envs/test-env/lib/python3.9/site-packages/rgf/utils.py:419: ValueError
__________________ TestRGFRegressor.test_sklearn_integration ___________________

self = <test_rgf_python.TestRGFRegressor testMethod=test_sklearn_integration>

    def test_sklearn_integration(self):
>       check_estimator(self.estimator_class())

tests/test_rgf_python.py:26: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/github/home/miniconda/envs/test-env/lib/python3.9/site-packages/sklearn/utils/estimator_checks.py:547: in check_estimator
    check(estimator)
/github/home/miniconda/envs/test-env/lib/python3.9/site-packages/sklearn/utils/_testing.py:308: in wrapper
    return fn(*args, **kwargs)
/github/home/miniconda/envs/test-env/lib/python3.9/site-packages/sklearn/utils/estimator_checks.py:914: in check_sample_weights_invariance
    estimator2.fit(X2, y=y2, sample_weight=sw2)
/github/home/miniconda/envs/test-env/lib/python3.9/site-packages/rgf/utils.py:657: in fit
    sample_weight = self._get_sample_weight(sample_weight)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = RGFRegressor()
sample_weight = array([1., 0., 1., 1., 0., 1., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0.,
       1., 0., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 0., 1., 1.])

    def _get_sample_weight(self, sample_weight):
        if sample_weight is not None:
            sample_weight = column_or_1d(sample_weight, warn=True)
            if (sample_weight <= 0).any():
>               raise ValueError("Sample weights must be positive.")
E               ValueError: Sample weights must be positive.

RGF doesn't support non-positive weights: https://github.com/RGF-team/rgf/blob/8c05fd629ca478ac46b1ebed9ed4787bfa52e9c3/python-package/rgf/utils.py#L415-L420 https://github.com/RGF-team/rgf/blob/faf4b4ad6fe2d51b3ceb7384c7ac86b8071e2545/RGF/src/tet/AzTETrainer.hpp#L116-L117