automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.55k stars 1.28k forks source link

AutoMLRegressor does not support task binary #1603

Open dadangsetio opened 1 year ago

dadangsetio commented 1 year ago

cant fit model with AutoMLRegression

from autosklearn.regression import AutoSklearnRegressor
reg = AutoSklearnRegressor(time_left_for_this_task=5*60, per_run_time_limit=30, n_jobs=8)
reg.fit(X=X_train, y=y_train)

this my log

ValueError                                Traceback (most recent call last)
Input In [25], in <cell line: 2>()
      1 reg = AutoSklearnRegressor(time_left_for_this_task=5*60, per_run_time_limit=30, n_jobs=8)
----> 2 reg.fit(X=X_train, y=y_train)

File ~/miniforge3/lib/python3.10/site-packages/autosklearn/estimators.py:1587, in AutoSklearnRegressor.fit(self, X, y, X_test, y_test, feat_type, dataset_name)
   1576     raise ValueError(
   1577         "Regression with data of type {} is "
   1578         "not supported. Supported types are {}. "
   (...)
   1582         "".format(target_type, supported_types)
   1583     )
   1585 # Fit is supposed to be idempotent!
   1586 # But not if we use share_mode.
-> 1587 super().fit(
   1588     X=X,
   1589     y=y,
   1590     X_test=X_test,
   1591     y_test=y_test,
   1592     feat_type=feat_type,
   1593     dataset_name=dataset_name,
   1594 )
   1596 return self

File ~/miniforge3/lib/python3.10/site-packages/autosklearn/estimators.py:540, in AutoSklearnEstimator.fit(self, **kwargs)
    538 if self.automl_ is None:
    539     self.automl_ = self.build_automl()
--> 540 self.automl_.fit(load_models=self.load_models, **kwargs)
    542 return self

File ~/miniforge3/lib/python3.10/site-packages/autosklearn/automl.py:2394, in AutoMLRegressor.fit(self, X, y, X_test, y_test, feat_type, dataset_name, only_return_configuration_space, load_models)
   2383 def fit(
   2384     self,
   2385     X: SUPPORTED_FEAT_TYPES,
   (...)
   2392     load_models: bool = True,
   2393 ) -> AutoMLRegressor:
-> 2394     return super().fit(
   2395         X,
   2396         y,
   2397         X_test=X_test,
   2398         y_test=y_test,
   2399         feat_type=feat_type,
   2400         dataset_name=dataset_name,
   2401         only_return_configuration_space=only_return_configuration_space,
   2402         load_models=load_models,
   2403         is_classification=False,
   2404     )

File ~/miniforge3/lib/python3.10/site-packages/autosklearn/automl.py:611, in AutoML.fit(self, X, y, task, X_test, y_test, feat_type, dataset_name, only_return_configuration_space, load_models, is_classification)
    609     y_task = type_of_target(y)
    610     if not self._supports_task_type(y_task):
--> 611         raise ValueError(
    612             f"{self.__class__.__name__} does not support" f" task {y_task}"
    613         )
    614     self._task = self._task_type_id(y_task)
    615 else:

ValueError: AutoMLRegressor does not support task binary

System Details (if relevant)

eddiebergman commented 1 year ago

Hi @dadangsetio, we use sklearn.utils.multiclass.type_of_target to identify the task type based on the y you pass in. My guess is that it looks something like [0, 1, 0, 1, 1, ...] which gets identified as a binary classification problem. Is this your intended behavior? If so, then I'm not sure we have any way to overwrite this behaviour but I can look into it if it is.

dadangsetio commented 1 year ago

Hi @dadangsetio, we use sklearn.utils.multiclass.type_of_target to identify the task type based on the y you pass in. My guess is that it looks something like [0, 1, 0, 1, 1, ...] which gets identified as a binary classification problem. Is this your intended behavior? If so, then I'm not sure we have any way to overwrite this behaviour but I can look into it if it is.

thank you for response @eddiebergman you are right that the content of y is binary, so how can i solve them?

eddiebergman commented 1 year ago

You may prefer to use probability scores from predict_proba and use a Classifier instead of a Regressor.

If you really need to skip the type_of_target check then you'll need to use the AutoML class instead of the AutoSklearnRegresssor, which is just a fancy wrapper that makes some things simpler, however depending on your use case this should be okay.

Here's a sample snippet:

from sklearn.datasets import make_classification

from autosklearn.automl import AutoML
from autosklearn.constants import REGRESSION

X, y = make_classification()
print(y)   # [0, 0, 1, ...]

automl = AutoML(
    time_left_for_this_task=30,
    per_run_time_limit=5,
    ...,
)

regressor.fit(X, y, task=REGRESSION, ...)

Here's the __init__(...) and the fit(...) calls from AutoML for you.

Best, Eddie

dadangsetio commented 1 year ago

iam use sample snippet of AutoML , but getting error like this

[ERROR] [2022-11-07 19:18:21,120:Client-AutoML(1):441115fc-5e96-11ed-acf3-363077345c9d] (' Dummy prediction failed with run state StatusType.CRASHED and additional output: {\'error\': \'Result queue is empty\', \'exit_status\': "<class \'pynisher.limit_function_call.AnythingException\'>", \'subprocess_stdout\': \'\', \'subprocess_stderr\': \'Process pynisher function call:\\nTraceback (most recent call last):\\n  File "/Users/dadangbudi/miniforge3/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap\\n    self.run()\\n  File "/Users/dadangbudi/miniforge3/lib/python3.10/multiprocessing/process.py", line 108, in run\\n    self._target(*self._args, **self._kwargs)\\n  File "/Users/dadangbudi/miniforge3/lib/python3.10/site-packages/pynisher/limit_function_call.py", line 108, in subprocess_func\\n    resource.setrlimit(resource.RLIMIT_AS, (mem_in_b, mem_in_b))\\nValueError: current limit exceeds maximum limit\\n\', \'exitcode\': 1, \'configuration_origin\': \'DUMMY\'}.',)
[ERROR] [2022-11-07 19:18:21,120:Client-AutoML(1):441115fc-5e96-11ed-acf3-363077345c9d] (' Dummy prediction failed with run state StatusType.CRASHED and additional output: {\'error\': \'Result queue is empty\', \'exit_status\': "<class \'pynisher.limit_function_call.AnythingException\'>", \'subprocess_stdout\': \'\', \'subprocess_stderr\': \'Process pynisher function call:\\nTraceback (most recent call last):\\n  File "/Users/dadangbudi/miniforge3/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap\\n    self.run()\\n  File "/Users/dadangbudi/miniforge3/lib/python3.10/multiprocessing/process.py", line 108, in run\\n    self._target(*self._args, **self._kwargs)\\n  File "/Users/dadangbudi/miniforge3/lib/python3.10/site-packages/pynisher/limit_function_call.py", line 108, in subprocess_func\\n    resource.setrlimit(resource.RLIMIT_AS, (mem_in_b, mem_in_b))\\nValueError: current limit exceeds maximum limit\\n\', \'exitcode\': 1, \'configuration_origin\': \'DUMMY\'}.',)
Traceback (most recent call last):
  File "/Users/dadangbudi/miniforge3/lib/python3.10/site-packages/autosklearn/automl.py", line 765, in fit
    self._do_dummy_prediction()
  File "/Users/dadangbudi/miniforge3/lib/python3.10/site-packages/autosklearn/automl.py", line 489, in _do_dummy_prediction
    raise ValueError(msg)
ValueError: (' Dummy prediction failed with run state StatusType.CRASHED and additional output: {\'error\': \'Result queue is empty\', \'exit_status\': "<class \'pynisher.limit_function_call.AnythingException\'>", \'subprocess_stdout\': \'\', \'subprocess_stderr\': \'Process pynisher function call:\\nTraceback (most recent call last):\\n  File "/Users/dadangbudi/miniforge3/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap\\n    self.run()\\n  File "/Users/dadangbudi/miniforge3/lib/python3.10/multiprocessing/process.py", line 108, in run\\n    self._target(*self._args, **self._kwargs)\\n  File "/Users/dadangbudi/miniforge3/lib/python3.10/site-packages/pynisher/limit_function_call.py", line 108, in subprocess_func\\n    resource.setrlimit(resource.RLIMIT_AS, (mem_in_b, mem_in_b))\\nValueError: current limit exceeds maximum limit\\n\', \'exitcode\': 1, \'configuration_origin\': \'DUMMY\'}.',)
eddiebergman commented 1 year ago

You should use the same parameters you use when you constructed the estimator as you do in your original code, my guess is you had set the memory_limit=None.

The issue is that there is no way to limit the memory of processes on Mac as far as I know. See https://github.com/automl/pynisher#features

The above version of pynisher we use is actually newer and we need to update to it.

ViktorooReps commented 2 months ago
classifier = AutoSklearn2Classifier(
    time_left_for_this_task=15 * 60,
    per_run_time_limit=30,
    memory_limit=None,
    n_jobs=1, 
    max_models_on_disc=10,
    ensemble_size=10
).fit(preprocessor.transform(train_x), train_y, preprocessor.transform(valid_x), valid_y)

There is an internal check that prohibits running without memory limit:

[ERROR] [2024-07-18 15:19:23,002:Client-AutoML(1):5923f702-4508-11ef-82ea-42442fa1d044] '>' not supported between instances of 'NoneType' and 'int'
Traceback (most recent call last):
  File "/Users/Viktor/PycharmProjects/laion-copyright/.venv39/lib/python3.9/site-packages/autosklearn/automl.py", line 680, in fit
    X, y = reduce_dataset_size_if_too_large(
  File "/Users/Viktor/PycharmProjects/laion-copyright/.venv39/lib/python3.9/site-packages/autosklearn/util/data.py", line 430, in reduce_dataset_size_if_too_large
    assert memory_limit > 0
TypeError: '>' not supported between instances of 'NoneType' and 'int'

It's such a shame we cannot use auto-sklearn on Apple Silicon.. Hopefully one day you find a workaround!

dadangsetio commented 2 months ago

Yes, it's true, I used to feel like that @ViktorooReps