autogluon / autogluon

Fast and Accurate ML in 3 Lines of Code
https://auto.gluon.ai/
Apache License 2.0
7.72k stars 903 forks source link

[BUG] Quantile Mode seems to behave like classification in some bad ways #3545

Closed SamuelGabriel closed 10 months ago

SamuelGabriel commented 1 year ago

Hi There,

Nice to have seen some of you at Automl-conf and thanks for this package.

I run into the following errors when using quantile (@jwmueller). I give a little script you can copy into ipython to see it for yourself below. It happens on any kind of dataset for me.

Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 21.86s of the 21.86s of remaining time.
    Fitting 5 child models (S1F1 - S1F5) | Fitting with ParallelLocalFoldFittingStrategy
    Warning: Exception caused RandomForestMSE_BAG_L1 to fail during training... Skipping this model.
        ray::_ray_fit() (pid=2904225, ip=10.5.166.216)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 402, in _ray_fit
    fold_model.fit(X=X_fold, y=y_fold, X_val=X_val_fold, y_val=y_val_fold, time_limit=time_limit_fold, **resources, **kwargs_fold)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 829, in fit
    out = self._fit(**kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/tabular/models/rf/rf_model.py", line 195, in _fit
    model = model.fit(X, y, sample_weight=sample_weight)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/tabular/models/rf/rf_quantile.py", line 513, in fit
    y_train_leaves = est.y_train_leaves_
AttributeError: 'DecisionTreeQuantileRegressor' object has no attribute 'y_train_leaves_'
Detailed Traceback:
Traceback (most recent call last):
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1733, in _train_and_save
    model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1684, in _train_single
    model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 829, in fit
    out = self._fit(**kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 169, in _fit
    return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 266, in _fit
    self._fit_folds(
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 592, in _fit_folds
    fold_fitting_strategy.after_all_folds_scheduled()
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 581, in after_all_folds_scheduled
    raise processed_exception
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 541, in after_all_folds_scheduled
    fold_model, pred_proba, time_start_fit, time_end_fit, predict_time, predict_1_time = self.ray.get(finished)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/ray/_private/worker.py", line 2380, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::_ray_fit() (pid=2891838, ip=10.5.166.216)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/ensemble/fold_fitting_strategy.py", line 402, in _ray_fit
    fold_model.fit(X=X_fold, y=y_fold, X_val=X_val_fold, y_val=y_val_fold, time_limit=time_limit_fold, **resources, **kwargs_fold)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 829, in fit
    out = self._fit(**kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/tabular/models/rf/rf_model.py", line 195, in _fit
    model = model.fit(X, y, sample_weight=sample_weight)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/tabular/models/rf/rf_quantile.py", line 513, in fit
    y_train_leaves = est.y_train_leaves_
AttributeError: 'ExtraTreeQuantileRegressor' object has no attribute 'y_train_leaves_'

and

Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 29.98s of the 29.98s of remaining time.
    Warning: Exception caused KNeighborsUnif_BAG_L1 to fail during training... Skipping this model.
        Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.
Detailed Traceback:
Traceback (most recent call last):
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1733, in _train_and_save
    model = self._train_single(X, y, model, X_val, y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1684, in _train_single
    model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, total_resources=total_resources, **model_fit_kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 829, in fit
    out = self._fit(**kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 169, in _fit
    return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 250, in _fit
    self._fit_single(X=X, y=y, model_base=model_base, use_child_oof=use_child_oof, skip_oof=_skip_oof, **kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 379, in _fit_single
    model_base.fit(X=X, y=y, time_limit=time_limit, **kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/core/models/abstract/abstract_model.py", line 829, in fit
    out = self._fit(**kwargs)
  File "/home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages/autogluon/tabular/models/knn/knn_model.py", line 97, in _fit
    self.model = self._get_model_type()(**params).fit(X, y)
  File "/home/muellesa/.local/lib/python3.10/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/muellesa/.local/lib/python3.10/site-packages/sklearn/neighbors/_classification.py", line 233, in fit
    return self._fit(X, y)
  File "/home/muellesa/.local/lib/python3.10/site-packages/sklearn/neighbors/_base.py", line 480, in _fit
    check_classification_targets(y)
  File "/home/muellesa/.local/lib/python3.10/site-packages/sklearn/utils/multiclass.py", line 216, in check_classification_targets
    raise ValueError(
ValueError: Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 29.97s of the 29.97s of remaining time.
    Warning: Exception caused KNeighborsDist_BAG_L1 to fail during training... Skipping this model.
        Unknown label type: continuous. Maybe you are trying to fit a classifier, which expects discrete classes on a regression target with continuous values.

I run autogluon like this:

import pandas as pd
import numpy as np
from autogluon.tabular import TabularPredictor
from sklearn.datasets import make_regression

# Generate synthetic regression data
x, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Create a DataFrame
train_data = pd.DataFrame(np.concatenate([x, y[:, np.newaxis]], 1))
train_data.columns = [f"feature_{i}" for i in range(x.shape[1])] + ["target"]

# Create TabularPredictor
predictor = TabularPredictor(
            label=train_data.columns[-1],
            problem_type='quantile',
            quantile_levels=[.1, .5, .9],
        ).fit(
            train_data=train_data,
            time_limit=300,
            presets=["best_quality"]
        )

# Make predictions (example)
test_data = pd.DataFrame(np.random.randn(5, 10), columns=[f"feature_{i}" for i in range(10)])
predictions = predictor.predict(test_data)

print(predictions)

my env

>pip show autogluon scikit-learn

Name: autogluon
Version: 0.8.2
Summary: AutoML for Image, Text, and Tabular Data
Home-page: https://github.com/autogluon/autogluon
Author: AutoGluon Community
Author-email:
License: Apache-2.0
Location: /home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages
Requires: autogluon.core, autogluon.features, autogluon.multimodal, autogluon.tabular, autogluon.timeseries
Required-by:
---
Name: scikit-learn
Version: 1.2.1
Summary: A set of python modules for machine learning and data mining
Home-page: http://scikit-learn.org
Author:
Author-email:
License: new BSD
Location: /home/muellesa/miniconda3/envs/prior-fitting-24-autogluon/lib/python3.10/site-packages
Requires: joblib, numpy, scipy, threadpoolctl
Required-by: autogluon.core, autogluon.features, autogluon.multimodal, autogluon.tabular, blitz-bayesian-pytorch, fastai, gpytorch, lightgbm, mlforecast, openml, pytorch-metric-learning, quantile-forest, seqeval, tabpfn

Screenshots / Logs

Installed Versions

```python # Replace this code with the output of the following: from autogluon.core.utils import show_versions show_versions() ```
shchur commented 10 months ago

Hi, thank you for the bug report!

@Innixma I was able to reproduce the bug locally. It seems that the root of the problem is this change that accidentally removed the code for ensuring that only models listed in DEFAULT_QUANTILE_MODEL can be trained when problem_type==QUANTILE. We need to introduce this change and, probably, add tests that would catch similar problems in the future.

shchur commented 10 months ago

Fixed in #3761