automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.62k stars 1.28k forks source link

AttributeError: 'AutoMLClassifier' object has no attribute '_automl' when calling predict_proba on a fit classifier #409

Closed drakeeee closed 6 years ago

drakeeee commented 6 years ago

pipe = autosklearn.classification.AutoSklearnClassifier()

pipe.fit(clf_x, labels)

probs = pipe.predict_proba(clf_x) # Error happens here
roc_auc = metrics.roc_auc_score(labels, probs[:, 1])
print(roc_auc)
AttributeError                            Traceback (most recent call last)
<ipython-input-257-a782d97ebce8> in <module>()
----> 1 pipe.predict_proba(clf_x[te_idx])

~/repos/vcf/research/env/lib/python3.5/site-packages/autosklearn/estimators.py in predict_proba(self, X, batch_size, n_jobs)
    430         """
    431         return self._automl.predict_proba(
--> 432             X, batch_size=batch_size, n_jobs=n_jobs)
    433 
    434 

~/repos/vcf/research/env/lib/python3.5/site-packages/autosklearn/automl.py in predict_proba(self, X, batch_size, n_jobs)
    944 
    945     def predict_proba(self, X, batch_size=None, n_jobs=1):
--> 946         return self._automl.predict(X, batch_size=batch_size, n_jobs=n_jobs)
    947 
    948 

AttributeError: 'AutoMLClassifier' object has no attribute '_automl'

Hello. Thanks for the great library. I am getting an AttributeError when trying to call predict_proba on a fit classifier. Have you seen this error before?

drakeeee commented 6 years ago

Ah I think I might have solved my own issue,

pipe = autosklearn.classification.AutoSklearnClassifier()

pipe.fit(clf_x[tr_idx], labels[tr_idx], 
         metric=make_scorer('log_loss', metrics.log_loss, needs_proba=True))    

probs = pipe.predict_proba(clf_x) # Error happens here
roc_auc = metrics.roc_auc_score(labels, probs[:, 1])
print(roc_auc)

I'll close if this solves the issue.

mfeurer commented 6 years ago

When closing, could you please briefly describe what you changed?

mabryj2 commented 6 years ago

I also see this error. I reproduced it by modifying your example

$ git diff
diff --git a/example/example_sequential.py b/example/example_sequential.py
index 019ad3b..c325cee 100644
--- a/example/example_sequential.py
+++ b/example/example_sequential.py
@@ -25,7 +25,7 @@ def main():
     # This call to fit_ensemble uses all models trained in the previous call
     # to fit to build an ensemble which can be used with automl.predict()
     automl.fit_ensemble(y_train, ensemble_size=50)
-
+    probs = automl.predict_proba(X_train)
     print(automl.show_models())
     predictions = automl.predict(X_test)
     print(automl.sprint_statistics())
$ python example/example_sequential.py
Traceback (most recent call last):
  File "example/example_sequential.py", line 36, in <module>
    main()
  File "example/example_sequential.py", line 28, in main
    probs = automl.predict_proba(X_train)
  File "/auto-sklearn/autosklearn/estimators.py", line 432, in predict_proba
    X, batch_size=batch_size, n_jobs=n_jobs)
  File "/auto-sklearn/autosklearn/automl.py", line 946, in predict_proba
    return self._automl.predict(X, batch_size=batch_size, n_jobs=n_jobs)
AttributeError: 'AutoMLClassifier' object has no attribute '_automl'
VincentBonnivard commented 6 years ago

Hello,

I have the same issue with the latest version available on Pypi; has the fix been pushed ? Thanks!

mfeurer commented 6 years ago

The fix is only in the development branch, but not yet on Pypi.

VincentBonnivard commented 6 years ago

Thanks for your answer. I've installed the GitHub version and got the following error when using predict_proba:

AttributeError: 'NoneType' object has no attribute 'get_model_identifiers'

The parameters for the classifiers were the following: automl = autosklearn.classification.AutoSklearnClassifier( tmp_folder='/tmp/autosklearn_cv_example_tmp', output_folder='/tmp/autosklearn_cv_example_out', delete_tmp_folder_after_terminate=False, seed=42,initial_configurations_via_metalearning=0, ml_memory_limit=10000, ensemble_size=0 )

Here is the Traceback:


AttributeError Traceback (most recent call last)

in () ----> 1 automl.predict_proba(X_test) ~/workspace/monsoon/venv_monsoon/lib/python3.4/site-packages/autosklearn/estimators.py in predict_proba(self, X, batch_size, n_jobs) 434 """ 435 return super().predict_proba( --> 436 X, batch_size=batch_size, n_jobs=n_jobs) 437 438 ~/workspace/monsoon/venv_monsoon/lib/python3.4/site-packages/autosklearn/estimators.py in predict_proba(self, X, batch_size, n_jobs) 311 def predict_proba(self, X, batch_size=None, n_jobs=1): 312 return self._automl.predict_proba( --> 313 X, batch_size=batch_size, n_jobs=n_jobs) 314 315 def score(self, X, y): ~/workspace/monsoon/venv_monsoon/lib/python3.4/site-packages/autosklearn/automl.py in predict_proba(self, X, batch_size, n_jobs) 944 945 def predict_proba(self, X, batch_size=None, n_jobs=1): --> 946 return super().predict(X, batch_size=batch_size, n_jobs=n_jobs) 947 948 ~/workspace/monsoon/venv_monsoon/lib/python3.4/site-packages/autosklearn/automl.py in predict(self, X, batch_size, n_jobs) 533 all_predictions = joblib.Parallel(n_jobs=n_jobs)( 534 joblib.delayed(_model_predict)(self, X, batch_size, identifier) --> 535 for identifier in self.ensemble_.get_model_identifiers()) 536 537 if len(all_predictions) == 0: AttributeError: 'NoneType' object has no attribute 'get_model_identifiers'
mfeurer commented 6 years ago

Could you please paste a minimal working example?

VincentBonnivard commented 6 years ago

Here is the example.

from sklearn.datasets import load_digits import autosklearn.classification import pandas as pd import numpy as np automl = autosklearn.classification.AutoSklearnClassifier( tmp_folder='/tmp/autosklearn_cv_example_tmp', output_folder='/tmp/autosklearn_cv_example_out', delete_tmp_folder_after_terminate=False, seed=42,initial_configurations_via_metalearning=0, ml_memory_limit=10000, ensemble_size=0) X, y = load_digits(return_X_y=True) X_train, X_test, y_train, y_test = \ train_test_split(X, y, random_state=1) automl.fit(X_train, y_train, dataset_name='digits', metric=auc) automl.predict_proba(X_test)`

I got the following error message:

AttributeError Traceback (most recent call last)

in () ----> 1 automl.predict_proba(X_test) ~/workspace/monsoon/venv_monsoon/lib/python3.4/site-packages/autosklearn/estimators.py in predict_proba(self, X, batch_size, n_jobs) 434 """ 435 return super().predict_proba( --> 436 X, batch_size=batch_size, n_jobs=n_jobs) 437 438 ~/workspace/monsoon/venv_monsoon/lib/python3.4/site-packages/autosklearn/estimators.py in predict_proba(self, X, batch_size, n_jobs) 311 def predict_proba(self, X, batch_size=None, n_jobs=1): 312 return self._automl.predict_proba( --> 313 X, batch_size=batch_size, n_jobs=n_jobs) 314 315 def score(self, X, y): ~/workspace/monsoon/venv_monsoon/lib/python3.4/site-packages/autosklearn/automl.py in predict_proba(self, X, batch_size, n_jobs) 944 945 def predict_proba(self, X, batch_size=None, n_jobs=1): --> 946 return super().predict(X, batch_size=batch_size, n_jobs=n_jobs) 947 948 ~/workspace/monsoon/venv_monsoon/lib/python3.4/site-packages/autosklearn/automl.py in predict(self, X, batch_size, n_jobs) 533 all_predictions = joblib.Parallel(n_jobs=n_jobs)( 534 joblib.delayed(_model_predict)(self, X, batch_size, identifier) --> 535 for identifier in self.ensemble_.get_model_identifiers()) 536 537 if len(all_predictions) == 0: AttributeError: 'NoneType' object has no attribute 'get_model_identifiers' My pip list is: alabaster (0.7.10) auto-sklearn (0.3.0) autopep8 (1.3.3) Babel (2.5.3) backports-abc (0.5) bleach (2.1.1) certifi (2018.1.18) chardet (3.0.4) click (6.7) cloudpickle (0.5.2) ConfigSpace (0.4.4) cycler (0.10.0) Cython (0.27.3) dask (0.15.2) decorator (4.2.1) distributed (1.18.3) docutils (0.14) entrypoints (0.2.3) future (0.16.0) HeapDict (1.0.0) html5lib (1.0b10) idna (2.6) imagesize (0.7.1) iml (0.3.5) ipaddress (1.0.19) ipykernel (4.6.1) ipython (6.2.1) ipython-genutils (0.2.0) ipywidgets (7.0.5) jedi (0.11.0) Jinja2 (2.10) joblib (0.11) jsonschema (2.6.0) jupyter (1.0.0) jupyter-client (5.1.0) jupyter-console (5.2.0) jupyter-core (4.4.0) liac-arff (2.1.1) lime (0.1.1.29) line-profiler (2.1.2) lockfile (0.12.2) MarkupSafe (1.0) matplotlib (2.1.0) mistune (0.8.3) msgpack (0.5.0) msgpack-python (0.5.4) nbconvert (5.3.1) nbformat (4.4.0) networkx (2.1) nose (1.3.7) notebook (5.2.2) numpy (1.14.0) pandas (0.21.0) pandocfilters (1.4.2) parso (0.1.0) patsy (0.5.0) PeakUtils (1.1.1) pexpect (4.3.0) pickleshare (0.7.4) Pillow (5.0.0) pip (9.0.1) prompt-toolkit (1.0.15) psutil (5.4.3) ptyprocess (0.5.2) pycodestyle (2.3.1) Pygments (2.2.0) pymining (0.2) pynisher (0.4.2) pyparsing (2.2.0) pyrfr (0.7.3) python-dateutil (2.6.1) pytz (2018.3) PyWavelets (0.5.2) PyYAML (3.12) pyzmq (16.0.3) qtconsole (4.3.1) read-excel (0.1) requests (2.18.4) scikit-image (0.13.1) scikit-learn (0.19.1) scipy (1.0.0) seaborn (0.8.1) setuptools (38.5.1) shap (0.10.0) simplegeneric (0.8.1) simplejson (3.13.2) six (1.11.0) smac (0.8.0) snowballstemmer (1.2.1) sortedcontainers (1.5.9) Sphinx (1.6.7) sphinx-rtd-theme (0.2.4) sphinxcontrib-websupport (1.0.1) statsmodels (0.8.0) tblib (1.3.2) terminado (0.8.1) testpath (0.3.1) toolz (0.9.0) tornado (4.5.3) tqdm (4.19.5) traitlets (4.3.2) tsfresh (0.11.0) typing (3.6.4) urllib3 (1.22) wcwidth (0.1.7) webencodings (0.5.1) wheel (0.29.0) widgetsnbextension (3.0.8) xgboost (0.6a2) xlrd (1.1.0) zict (0.1.3)
mfeurer commented 6 years ago

I think there are two issues here:

  1. When using an ensemble size of 0, one cannot use predict.
  2. Area under curve is not defined for multiclass.

The following works:

from sklearn.datasets import load_digits
import autosklearn.classification
from sklearn.model_selection import train_test_split
from autosklearn.metrics import f1_macro

automl = autosklearn.classification.AutoSklearnClassifier(
    tmp_folder='/tmp/autosklearn_cv_example_tmp',
    output_folder='/tmp/autosklearn_cv_example_out',
    delete_tmp_folder_after_terminate=False,
    seed=42, initial_configurations_via_metalearning=0,
    ml_memory_limit=10000,
    ensemble_size=1,
    time_left_for_this_task=60,
)
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
automl.fit(X_train, y_train, dataset_name='digits', metric=f1_macro)
automl.predict_proba(X_test)

Re-opening this to remind us to add better error messages.

VincentBonnivard commented 6 years ago

Thanks. I'll have a look at your solution!

chrisby commented 6 years ago

The code posted by @mfeurer does not solve the issue. I still get 'AutoMLClassifier' object has no attribute '_automl'.

NVM: My version of autoML was not updated.

mfeurer commented 6 years ago

@chrisby could you please upload the log of Auto-sklearn somewhere (either as a file on github, or to pastebin)?

barley99 commented 6 years ago

I've changed a string in automl.py and got expected probabilities

    def predict_proba(self, X, batch_size=None, n_jobs=1):
        # return self._automl.predict(X, batch_size=batch_size, n_jobs=n_jobs)
        return super().predict(X, batch_size=batch_size, n_jobs=n_jobs)
mfeurer commented 6 years ago

Thanks @barley99 we will put your fix in the next release.

ahn1340 commented 6 years ago

The fix by @barley99 is already in the development branch, so this issue can be closed.

ballcap231 commented 5 years ago

Has this issue been fixed? Has anyone else also tried everything here and still receive the same error?