ENH: Support multiple metrics

TomAugspurger commented 6 years ago

Closes https://github.com/dask/dask-searchcv/issues/52

In [1]: from sklearn.datasets import make_hastie_10_2
   ...: from sklearn.metrics import make_scorer
   ...: from sklearn.metrics import accuracy_score
   ...: from sklearn.tree import DecisionTreeClassifier
   ...:
   ...: import dask_searchcv as dcv
   ...:
   ...: X, y = make_hastie_10_2(n_samples=1000, random_state=42)
   ...:
   ...: # The scorers can be either be one of the predefined metric strings or a scorer
   ...: # callable, like the one returned by make_scorer
   ...: scoring = {'AUC': 'roc_auc', 'Accuracy': make_scorer(accuracy_score)}
   ...:
   ...: # Setting refit='AUC', refits an estimator on the whole dataset with the
   ...: # parameter setting that has the best cross-validated AUC score.
   ...: # That estimator is made available at ``gs.best_estimator_`` along with
   ...: # parameters like ``gs.best_score_``, ``gs.best_parameters_`` and
   ...: # ``gs.best_index_``
   ...: a = dcv.GridSearchCV(DecisionTreeClassifier(random_state=42),
   ...:                      param_grid={'min_samples_split': range(2, 403, 10)},
   ...:                      scoring=scoring, cv=5, refit='AUC')
   ...: a.fit(X, y)
   ...:

In [2]: a.scorer_
Out[2]:
{'AUC': make_scorer(roc_auc_score, needs_threshold=True),
 'Accuracy': make_scorer(accuracy_score)}

In [3]: a.multimetric_
Out[3]: True

In [4]: a.cv_results_.keys()
Out[4]: dict_keys(['params', 'mean_fit_time', 'std_fit_time', 'mean_score_time',
... 'std_score_time', 'split0_test_AUC', 'split1_test_AUC', 'split2_test_AUC',
... 'split3_test_AUC', 'split4_test_AUC', 'mean_test_AUC', 'std_test_AUC',
... 'rank_test_AUC', 'split0_test_Accuracy', 'split1_test_Accuracy',
... 'split2_test_Accuracy', 'split3_test_Accuracy', 'split4_test_Accuracy',
... 'mean_test_Accuracy', 'std_test_Accuracy', 'rank_test_Accuracy',
... 'split0_train_AUC', 'split1_train_AUC', 'split2_train_AUC',
... 'split3_train_AUC', 'split4_train_AUC', 'mean_train_AUC',
... 'std_train_AUC', 'split0_train_Accuracy', 'split1_train_Accuracy',
... 'split2_train_Accuracy', 'split3_train_Accuracy', 'split4_train_Accuracy',
... 'mean_train_Accuracy', 'std_train_Accuracy', 'param_min_samples_split'])

I'll add more tests later, but does the overall approach look OK here @jcrist? I had to modify a few places, but none of the modifications were that extensive.

cc @jnothman

jnothman commented 6 years ago

We don't exactly make *SearchCV compatibility easy, do we? But I guess we don't want the burden of maintaining public versions of _fit_and_score etc.

TomAugspurger commented 6 years ago

Thanks @jcrist much cleaner now I think.

I'm having trouble propagating failures with multiple metrics, so the CI will fail for now. I'll push a fix tomorrow morning.

TomAugspurger commented 6 years ago

OK, I think I've got it now.

One question I had is how to handle best_score_, best_index_, best_params_, etc. when refit=False. I see two options:

Don't set them (this would be consistent with scikit-learn, but not dask-searchcv)
Make them dictionaries of {scorer -> float / int / dict} with the best score / index / params for that particular scorer.

I'm going to try out 2, and see what it breaks.

TomAugspurger commented 6 years ago

FYI all green @jcrist.

I'm no 100% convinced that the way I've handled the best_* attributes is the best way of doing things. It seems weird for the return type of a property to depend on two parameters (scoring being a dict, and refit being False). Still, I haven't come up with a cleaner way.

TomAugspurger commented 6 years ago

Ok, hopefully things are good now: Changes:

The test_no_refit test was incorrectly passing since the assert was on any of the conditions being true, instead of all. I fixed that test (and the implemention) in https://github.com/dask/dask-searchcv/pull/60/commits/9c677329477bdf692c584acc5d51281fd17cfcf7 The best_* properties will still show up in tab completion. I'm inclined to ignore this though. (Side-note, I didn't know that if you raised AttributeError in a property, then hasattr(thing, attr) is False, which is neat).

That cleaned up the multimetric code quite a bit, since my optionally returning a dict if refit is False moot.

Finally, I've borrowed scikit-learn's behavior / error message when fitting with multiple metrics and refit=True.

TomAugspurger commented 6 years ago

And here's a gist with the attr types of sklearn / dask-searchcv for various combos of scoring / refit. https://gist.github.com/TomAugspurger/4edd8bc24d93b9ef8d3c2385db7e0a3a

PeterDSteinberg commented 6 years ago

@TomAugspurger - Glad you're working on support for multiple metrics. In Elm we need this to support multiobjective optimization in elm.model_selection.EaSearchCV.

TomAugspurger commented 6 years ago

👍 , I think this is all ready @jcrist, if you have a chance to glance through the implementation again.

jcrist commented 6 years ago

Apologies for the delay in response:

I'm no 100% convinced that the way I've handled the best_* attributes is the best way of doing things. It seems weird for the return type of a property to depend on two parameters (scoring being a dict, and refit being False). Still, I haven't come up with a cleaner way.

I'd prefer to 100% match newer sklearn (0.19.1) behavior with respect to set attributes and handling of scorer/refit. I think this should be possible without to much effort. In this case we'd only set best_score_ with multiple metrics, etc... if refit is set:

    refit : boolean, or string, default=True
        Refit an estimator using the best found parameters on the whole
        dataset.

        For multiple metric evaluation, this needs to be a string denoting the
        scorer is used to find the best parameters for refitting the estimator
        at the end.

        The refitted estimator is made available at the ``best_estimator_``
        attribute and permits using ``predict`` directly on this
        ``GridSearchCV`` instance.

        Also for multiple metric evaluation, the attributes ``best_index_``,
        ``best_score_`` and ``best_parameters_`` will only be available if
        ``refit`` is set and all of them will be determined w.r.t this specific
        scorer.

        See ``scoring`` parameter to know more about multiple metric
        evaluation.

In general I'd like dask-searchcv to match the behavior of whatever scikit-learn version is installed as much as possible.

TomAugspurger commented 6 years ago

I'd prefer to 100% match newer sklearn (0.19.1) behavior with respect to set attributes and handling of scorer/refit.

I believe this branch currently achieves that. The only difference is that since best_params_ and best_score_ are properties, they show up in dir(est), but I think that's OK.

In [20]: import sklearn.model_selection as ms; import dask_searchcv as dcv

In [21]: from sklearn.ensemble import RandomForestClassifier; from sklearn.datasets import make_classification

In [22]: X, y = make_classification()

In [23]: a = dcv.GridSearchCV(RandomForestClassifier(), param_grid={'n_estimators': [10, 100]}, scoring={"auc": "roc_auc", "Accuracy": 'accuracy'}, refit=Fals
    ...: e)

In [24]: b = ms.GridSearchCV(RandomForestClassifier(), param_grid={'n_estimators': [10, 100]}, scoring={"auc": "roc_auc", "Accuracy": 'accuracy'}, refit=False
    ...: )

In [25]: a.fit(X, y); b.fit(X, y)
Out[25]:
GridSearchCV(cv=None, error_score='raise',
       estimator=RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False),
       fit_params=None, iid=True, n_jobs=1,
       param_grid={'n_estimators': [10, 100]}, pre_dispatch='2*n_jobs',
       refit=False, return_train_score='warn',
       scoring={'auc': 'roc_auc', 'Accuracy': 'accuracy'}, verbose=0)

In [26]: a.best_params_
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-26-ab762e88e145> in <module>()
----> 1 a.best_params_

~/Envs/dask-dev/lib/python3.6/site-packages/dask-searchcv/dask_searchcv/model_selection.py in best_params_(self)
    704     def best_params_(self):
    705         check_is_fitted(self, 'cv_results_')
--> 706         self._check_if_refit('best_params_')
    707         return self.cv_results_['params'][self.best_index_]
    708

~/Envs/dask-dev/lib/python3.6/site-packages/dask-searchcv/dask_searchcv/model_selection.py in _check_if_refit(self, attr)
    695         if not self.refit:
    696             raise AttributeError(
--> 697                 "'{}' is not a valid attribute with 'refit=False'.".format(attr))
    698
    699     @property

AttributeError: 'best_params_' is not a valid attribute with 'refit=False'.

In [27]: b.best_params_
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-27-9bb303929fe5> in <module>()
----> 1 b.best_params_

AttributeError: 'GridSearchCV' object has no attribute 'best_params_'

The error message is different, though ours is a bit better :) This is tested in https://github.com/dask/dask-searchcv/pull/60/files#diff-6980992e8dfcc62386acb240d0d401cdR986

TomAugspurger commented 6 years ago

Oh, and to be clear, that "I'm not 100% sure" comment from https://github.com/dask/dask-searchcv/pull/60#issuecomment-335919394 was based on an old implementation. I'm much more confident that we match scikit-learn's behavior now.

jcrist commented 6 years ago

Looks good, merging. Thanks!

dask / dask-searchcv

ENH: Support multiple metrics #60