automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.63k stars 1.28k forks source link

AutoSklearnRegressor never return after fit() #1648

Open xieleo5 opened 1 year ago

xieleo5 commented 1 year ago

Describe the bug

After the regressor search for 60 seconds, it just get stuck and never return. It even does not generate the "trajectory.json" under smac3-output. I guess there may be some bugs on SMAC side.

To Reproduce

import openml
from autosklearn.classification import AutoSklearnClassifier

task = openml.tasks.get_task(233211)
X, y = task.get_X_and_y("dataframe")

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=42
)

askl_config = {
    "time_left_for_this_task": 60,
}

automl = AutoSklearnClassifier(**askl_config)
automl.fit(X_train, y_train)
# it never reaches lines after

Expected behavior

It should stop after time up.

Actual behavior, stacktrace or logfile

Got stuck, never return from fit(), I have waited for 2 hours but it still doesn't return.

Environment and installation:

Please give details about your installation:

aron-bram commented 1 year ago

Hi, The openml dataset that you are running the automl classifier on is actually a supervised regression problem. Please try using AutoSklearnRegressor instead.

Example:

from autosklearn.estimators import AutoSklearnRegressor
import openml
from autosklearn.classification import AutoSklearnClassifier
import sklearn

task = openml.tasks.get_task(233211)

X, y = task.get_X_and_y("dataframe")

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=42
)

askl_config = {
    "time_left_for_this_task": 120,
    "delete_tmp_folder_after_terminate": False,  # good to have as False, in case we need to debug later
    "tmp_folder": "/tmp/auto-sklearn_run_DEBUG",  # change the path of the output folder to your liking
}

automl = AutoSklearnRegressor(
    **askl_config,
)
automl.fit(X_train, y_train)

print(automl.show_models())
# reached this point successfully

I think a memory out error might have caused the crash, since autosklearn seems to still forecfully try to fit the data with classifiers and uses immense amount of memory while doing so. It doesn't crash on my system, but I'm not using Windows with WSL2 like you do.

Let me know if it resolves your problem. Also, I apologize for the late answer.

whoisltd commented 1 year ago

i have same problem