automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.62k stars 1.28k forks source link

score(X_test,y_test) or predict(X_test) fails #93

Closed Motorrat closed 8 years ago

Motorrat commented 8 years ago

I run this code below

seed = 1
c = AutoSklearnClassifier(
        shared_mode=True, tmp_folder=atsklrn_tempdir, output_folder=atsklrn_tempdir,
        delete_tmp_folder_after_terminate=False, delete_output_folder_after_terminate=False,
        ensemble_size=0, initial_configurations_via_metalearning=0,
        include_preprocessors=('no_preprocessing',),
        seed=seed)
c._task = BINARY_CLASSIFICATION
c._metric = F1_METRIC
c._precision = '32'
c._dataset_name = 'Truffles'

c.run_ensemble_builder(
    time_left_for_ensembles=0,
    max_iterations=1,
    ensemble_size=25,
    ).wait()

time.sleep(10)

print(c.show_models())

print(c.score(X_test,y_test))

and get this stack trace

/home/ekobylkin/anaconda2/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py:342: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 11343049 but corresponding boolean dimension is 3117678
  missing = np.arange(X.shape[not self.axis])[invalid_mask]
Traceback (most recent call last):
  File "nfs_share/truffles-autosklearn-multy-ensemble.py", line 72, in <module>
    print(c.score(X_test,y_test))
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 656, in score
    prediction = self.predict_proba(X)
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 300, in predict_proba
    return super(AutoSklearnClassifier, self).predict_proba(X)
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 616, in predict_proba
    prediction = model.predict_proba(X_)
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/classification.py", line 110, in predict_proba
    Xt = transform.transform(Xt)
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/components/data_preprocessing/rescaling.py", line 19, in transform
    return self.preprocessor.transform(X)
  File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/implementations/MinMaxScaler.py", line 121, in transform
    X.data[X.indptr[i]:X.indptr[i + 1]] *= self.scale_[i]
IndexError: index 3117678 is out of bounds for axis 0 with size 3117678

The above one was rooted in having a X_test as a result from a different vectorizer. I have confirmed by vectorizing my X_train and X_test dataset again with the same vectorizer and reloading. So it seems it has to do with a slightly differing vectorizer output "dimension is 3117680 but corresponding boolean dimension is 3117678" - differs by 2. It should have been the same but it isn't.

/home/ekobylkin/anaconda2/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py:342: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 3117680 but corresponding boolean dimension is 3117678 missing = np.arange(X.shape[not self.axis])[invalid_mask] Traceback (most recent call last): File "nfs_share/truffles-autosklearn-multy-ensemble.py", line 72, in <module> print(c.score(X_test,y_test)) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 656, in score prediction = self.predict_proba(X) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 300, in predict_proba return super(AutoSklearnClassifier, self).predict_proba(X) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 616, in predict_proba prediction = model.predict_proba(X_) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/classification.py", line 110, in predict_proba Xt = transform.transform(Xt) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/components/data_preprocessing/rescaling.py", line 19, in transform return self.preprocessor.transform(X) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/implementations/MinMaxScaler.py", line 121, in transform X.data[X.indptr[i]:X.indptr[i + 1]] *= self.scale_[i] IndexError: index 3117678 is out of bounds for axis 0 with size 3117678

Motorrat commented 8 years ago

This issue can be closed. The problem is on my side.