EducationalTestingService / skll

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
http://skll.readthedocs.org
Other
550 stars 69 forks source link

Overhaul how custom metrics are used in SKLL #750

Closed desilinguist closed 1 year ago

desilinguist commented 1 year ago

Starting with scikit-learn v1.3.0, it's no longer possible to simply update the sklearn.metrics._scorer._SCORERS dictionary with new metric functions and have them be picked up. This is because scikit-learn now implements additional parameter validation and rejects any metric names that aren't its own metrics. To use custom metrics, we basically have to use callables instead of relying on custom strings that get looked up in _SCORERS. This means that we have to overhaul how we use custom metrics in SKLL (not just user-defined custom metrics but also pre-defined custom metrics such as pearson and kappa etc.).

This blocks #748.

desilinguist commented 1 year ago

Here's an example of the error we get with scikit-learn version 1.3.0 now.

InvalidParameterError: The 'scoring' parameter of GridSearchCV must be a str among {'neg_mean_absolute_error', 'max_error', 'rand_score', 'accuracy', 'f1_micro', 'recall_micro', 'top_k_accuracy', 'precision', 'recall_samples', 'average_precision', 'neg_mean_poisson_deviance', 'f1_samples', 'neg_mean_gamma_deviance', 'explained_variance', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'f1_weighted', 'precision_weighted', 'neg_log_loss', 'fowlkes_mallows_score', 'roc_auc_ovo', 'jaccard', 'jaccard_weighted', 'recall_weighted', 'v_measure_score', 'r2', 'neg_mean_squared_log_error', 'roc_auc', 'mutual_info_score', 'homogeneity_score', 'recall', 'neg_brier_score', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'completeness_score', 'balanced_accuracy', 'recall_macro', 'roc_auc_ovr_weighted', 'neg_mean_squared_error', 'roc_auc_ovo_weighted', 'roc_auc_ovr', 'f1_macro', 'f1', 'positive_likelihood_ratio', 'precision_micro', 'precision_macro', 'precision_samples', 'matthews_corrcoef', 'neg_root_mean_squared_error', 'jaccard_macro', 'neg_negative_likelihood_ratio', 'jaccard_micro', 'neg_mean_absolute_percentage_error', 'jaccard_samples'}, a callable, an instance of 'list', an instance of 'tuple', an instance of 'dict' or None. Got 'pearson' instead.