/home/ekobylkin/anaconda2/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py:342: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 11343049 but corresponding boolean dimension is 3117678
missing = np.arange(X.shape[not self.axis])[invalid_mask]
Traceback (most recent call last):
File "nfs_share/truffles-autosklearn-multy-ensemble.py", line 72, in <module>
print(c.score(X_test,y_test))
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 656, in score
prediction = self.predict_proba(X)
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 300, in predict_proba
return super(AutoSklearnClassifier, self).predict_proba(X)
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 616, in predict_proba
prediction = model.predict_proba(X_)
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/classification.py", line 110, in predict_proba
Xt = transform.transform(Xt)
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/components/data_preprocessing/rescaling.py", line 19, in transform
return self.preprocessor.transform(X)
File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/implementations/MinMaxScaler.py", line 121, in transform
X.data[X.indptr[i]:X.indptr[i + 1]] *= self.scale_[i]
IndexError: index 3117678 is out of bounds for axis 0 with size 3117678
The above one was rooted in having a X_test as a result from a different vectorizer.
I have confirmed by vectorizing my X_train and X_test dataset again with the same vectorizer and reloading. So it seems it has to do with a slightly differing vectorizer output "dimension is 3117680 but corresponding boolean dimension is 3117678" - differs by 2. It should have been the same but it isn't.
/home/ekobylkin/anaconda2/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py:342: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 3117680 but corresponding boolean dimension is 3117678 missing = np.arange(X.shape[not self.axis])[invalid_mask] Traceback (most recent call last): File "nfs_share/truffles-autosklearn-multy-ensemble.py", line 72, in <module> print(c.score(X_test,y_test)) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 656, in score prediction = self.predict_proba(X) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 300, in predict_proba return super(AutoSklearnClassifier, self).predict_proba(X) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 616, in predict_proba prediction = model.predict_proba(X_) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/classification.py", line 110, in predict_proba Xt = transform.transform(Xt) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/components/data_preprocessing/rescaling.py", line 19, in transform return self.preprocessor.transform(X) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/implementations/MinMaxScaler.py", line 121, in transform X.data[X.indptr[i]:X.indptr[i + 1]] *= self.scale_[i] IndexError: index 3117678 is out of bounds for axis 0 with size 3117678
I run this code below
and get this stack trace
The above one was rooted in having a X_test as a result from a different vectorizer. I have confirmed by vectorizing my X_train and X_test dataset again with the same vectorizer and reloading. So it seems it has to do with a slightly differing vectorizer output "dimension is 3117680 but corresponding boolean dimension is 3117678" - differs by 2. It should have been the same but it isn't.
/home/ekobylkin/anaconda2/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py:342: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 3117680 but corresponding boolean dimension is 3117678 missing = np.arange(X.shape[not self.axis])[invalid_mask] Traceback (most recent call last): File "nfs_share/truffles-autosklearn-multy-ensemble.py", line 72, in <module> print(c.score(X_test,y_test)) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 656, in score prediction = self.predict_proba(X) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/estimators.py", line 300, in predict_proba return super(AutoSklearnClassifier, self).predict_proba(X) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/automl.py", line 616, in predict_proba prediction = model.predict_proba(X_) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/classification.py", line 110, in predict_proba Xt = transform.transform(Xt) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/components/data_preprocessing/rescaling.py", line 19, in transform return self.preprocessor.transform(X) File "/home/ekobylkin/anaconda2/lib/python2.7/site-packages/AutoSklearn-0.0.1.dev0-py2.7-linux-x86_64.egg/autosklearn/pipeline/implementations/MinMaxScaler.py", line 121, in transform X.data[X.indptr[i]:X.indptr[i + 1]] *= self.scale_[i] IndexError: index 3117678 is out of bounds for axis 0 with size 3117678