fmohr / lcdb

12 stars 4 forks source link

[LCDB 2.0] RandomForestWorkflow ValueError for OOB Scores #17

Closed Deathn0t closed 11 months ago

Deathn0t commented 11 months ago

The following command triggers a ValueError:

lcdb test -id 6 -w lcdb.workflow.sklearn.RandomForestWorkflow -m -vs 42 -ts 42 -ws 42 --parameters '{"bootstrap": false, "criterion": "log_loss", "max_features": "all", "max_samples": 0.6460461826006697, "min_impurity_decrease": 0.7762379021238405, "min_samples_leaf": 12, "min_samples_split": 24, "n_estimators": 2000, "pp@cat_encoder": "onehot", "pp@decomposition": "none", "pp@featuregen": "poly", "pp@featureselector": "selectp", "pp@scaler": "std", "pp@kernel_pca_kernel": "linear", "pp@kernel_pca_n_components": 0.25, "pp@poly_degree": 2, "pp@selectp_percentile": 98, "pp@std_with_std": true}'

Output:

Traceback (most recent call last):
  File "/lus/grand/projects/datascience/regele/polaris/lcdb/publications/2023-neurips/lcdb/controller.py", line 208, in fit_workflow_on_current_anchor
    self.workflow.fit(
  File "/lus/grand/projects/datascience/regele/polaris/lcdb/publications/2023-neurips/lcdb/utils.py", line 67, in terminate_on_timeout
    return results.get(timeout)
  File "/lus/grand/projects/datascience/regele/polaris/lcdb/publications/2023-neurips/build/dhenv/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/lus/grand/projects/datascience/regele/polaris/lcdb/publications/2023-neurips/build/dhenv/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/lus/grand/projects/datascience/regele/polaris/lcdb/publications/2023-neurips/lcdb/workflow/_base_workflow.py", line 31, in fit
    self._fit(X=X, y=y, metadata=metadata, *args, **kwargs)
  File "/lus/grand/projects/datascience/regele/polaris/lcdb/publications/2023-neurips/lcdb/workflow/sklearn/_randomforest.py", line 186, in _fit
    scorer.score(
  File "/lus/grand/projects/datascience/regele/polaris/lcdb/publications/2023-neurips/lcdb/scorer.py", line 80, in score
    roc_auc_score(
  File "/lus/grand/projects/datascience/regele/polaris/lcdb/publications/2023-neurips/build/dhenv/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 214, in wrapper
    return func(*args, **kwargs)
  File "/lus/grand/projects/datascience/regele/polaris/lcdb/publications/2023-neurips/build/dhenv/lib/python3.10/site-packages/sklearn/metrics/_ranking.py", line 621, in roc_auc_score
    return _multiclass_roc_auc_score(
  File "/lus/grand/projects/datascience/regele/polaris/lcdb/publications/2023-neurips/build/dhenv/lib/python3.10/site-packages/sklearn/metrics/_ranking.py", line 694, in _multiclass_roc_auc_score
    raise ValueError(
ValueError: Target scores need to be probabilities for multiclass roc_auc, i.e. they should sum up to 1.0 over classes
Deathn0t commented 11 months ago

resolved in 4788eaee68ace8af9a0bc3701088a2cb9ec9cbf5. OOB scores are not computed when bootstrap=False and only on OOB samples when bootstrap=True...