Closed johnsanterre closed 8 years ago
Your labels column has 3 rows but M only has 2 rows. Please make sure that isn't the problem.
Failing gracefully is Pythonic.
Same error regardless.
Not a corner case, also failing on reasonable sized Matrix.
Also, please try {RandomForestClassifier: {}}
, rather than RandomForestClassifier()
exp = e.operate.simple_clf(np.array([[1,1,1],[1,2,3]]), np.array([1,1]),{RandomForestClassifier: {}}) exp.make_report() Traceback (most recent call last): File "
", line 1, in File "eights/perambulate/perambulate.py", line 162, in make_report sub_rep.add_summary_graph_roc_auc() File "eights/communicate/communicate.py", line 470, in add_summary_graph_roc_auc self.add_summary_graph('roc_auc') File "eights/communicate/communicate.py", line 450, in add_summary_graph trial, score in getattr(self.exp, measure)().iteritems()] File "eights/perambulate/perambulate.py", line 136, in roc_auc return {trial: trial.roc_auc() for trial in self.trials} File "eights/perambulate/perambulate.py", line 136, in return {trial: trial.roc_auc() for trial in self.trials} File "eights/perambulate/perambulate_helper.py", line 590, in roc_auc return self.median_run().roc_auc() File "eights/perambulate/perambulate_helper.py", line 415, in roc_auc return roc_auc_score(self.__test_y(), self. pred_proba()) File "eights/perambulate/perambulate_helper.py", line 330, in pred_proba return self.clf.predict_proba(self.test_M())[:,1] IndexError: index 1 is out of bounds for axis 1 with size 1
I think there's a few issues here related to stratification. In the test case, there is only one category presented, but the line that's throwing the error expects there to be two categories. When we change it to make 2 categories (below) then we run into a similar issue. The other issue is probably because the KFold cross-validation is selecting subsets of the data that, again, only have one label.
>>> exp = e.operate.simple_clf(np.array([[1,1,1],[1,2,3]]), np.array([1,0]),{RandomForestClassifier: {}})
>>> exp.make_report()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/zar1/dssg/eights/eights/perambulate/perambulate.py", line 162, in make_report
sub_rep.add_summary_graph_roc_auc()
File "/Users/zar1/dssg/eights/eights/communicate/communicate.py", line 470, in add_summary_graph_roc_auc
self.add_summary_graph('roc_auc')
File "/Users/zar1/dssg/eights/eights/communicate/communicate.py", line 450, in add_summary_graph
trial, score in getattr(self.__exp, measure)().iteritems()]
File "/Users/zar1/dssg/eights/eights/perambulate/perambulate.py", line 136, in roc_auc
return {trial: trial.roc_auc() for trial in self.trials}
File "/Users/zar1/dssg/eights/eights/perambulate/perambulate.py", line 136, in <dictcomp>
return {trial: trial.roc_auc() for trial in self.trials}
File "/Users/zar1/dssg/eights/eights/perambulate/perambulate_helper.py", line 590, in roc_auc
return self.median_run().roc_auc()
File "/Users/zar1/dssg/eights/eights/perambulate/perambulate_helper.py", line 415, in roc_auc
return roc_auc_score(self.__test_y(), self.__pred_proba())
File "/Library/Python/2.7/site-packages/sklearn/metrics/metrics.py", line 593, in roc_auc_score
sample_weight=sample_weight)
File "/Library/Python/2.7/site-packages/sklearn/metrics/metrics.py", line 473, in _average_binary_score
return binary_metric(y_true, y_score, sample_weight=sample_weight)
File "/Library/Python/2.7/site-packages/sklearn/metrics/metrics.py", line 584, in _binary_roc_auc_score
raise ValueError("Only one class present in y_true. ROC AUC score "
ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.
In the case of two labels:
exp = e.operate.simple_clf(np.array([[1,1,1],[1,2,3]]), np.array([1,2]),{RandomForestClassifier: {}}) exp.make_report() Traceback (most recent call last): File "
", line 1, in File "eights/perambulate/perambulate.py", line 162, in make_report sub_rep.add_summary_graph_roc_auc() File "eights/communicate/communicate.py", line 470, in add_summary_graph_roc_auc self.add_summary_graph('roc_auc') File "eights/communicate/communicate.py", line 450, in add_summary_graph trial, score in getattr(self.exp, measure)().iteritems()] File "eights/perambulate/perambulate.py", line 136, in roc_auc return {trial: trial.roc_auc() for trial in self.trials} File "eights/perambulate/perambulate.py", line 136, in return {trial: trial.roc_auc() for trial in self.trials} File "eights/perambulate/perambulate_helper.py", line 590, in roc_auc return self.median_run().roc_auc() File "eights/perambulate/perambulate_helper.py", line 415, in roc_auc return roc_auc_score(self.__test_y(), self. pred_proba()) File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/metrics.py", line 593, in roc_auc_score sample_weight=sample_weight) File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/metrics.py", line 473, in _average_binary_score return binary_metric(y_true, y_score, sample_weight=sample_weight) File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/metrics.py", line 584, in _binary_roc_auc_score raise ValueError("Only one class present in y_true. ROC AUC score " ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.
After further investigation, the cross-validation algo used for simple_clf is eights.perambulate.perambulate_helper.NoCV, which reserves no test set. Because it reserves no test set, any number of things requiring a test set doesn't work. Probably, what it should do is use sklearn's cross-validate to return a single fold which has both a train and a test set in it.
There are two things we need to do to resolve this:
These are solved by diogenes b0e2f8c