Open bjschoenfeld opened 5 years ago
Running on LL0_488_colleges_aaup dataset
Traceback (most recent call last):
File "venv/lib/python3.6/site-packages/metalearn/metafeatures/metafeatures.py", line 113, in compute
n_folds, verbose
File "venv/lib/python3.6/site-packages/metalearn/metafeatures/metafeatures.py", line 234, in _validate_compute_arguments
n_folds, verbose
File "venv/lib/python3.6/site-packages/metalearn/metafeatures/metafeatures.py", line 348, in _validate_n_folds
f"{group.shape[0]}."
ValueError: The minimum number of instances in each class of Y is n_folds=2. Class VIIB has 1.
Can we compare with OpenML on this?
Similar to this, datasets with fewer than 4 instances per class fail. Should we handle something like this?
import pandas as pd import numpy as np from metalearn import Metafeatures x = pd.DataFrame(np.random.rand(8,2)) y = pd.Series(['a','a','a','b','b','b']) Metafeatures().compute(x,y)
Traceback (most recent call last):
File "
datasets with fewer than 4 instances per class fail
I believe you, but why is it 4, not 2? We only do 2-fold cv.
I think it's because with 2-fold cv the training set has half as many instances, so it needs at least 4
I would think that if there were only two instances and two folds, one instance would go to each fold. The folds would take turns being the train and test sets...
Our landmarkers perform cross validation with 2 folds. Some datasets may have only 1 instance of a particular target class. In this case, the validation in sklearn's cross validation throws an error, requiring at least n_folds (2 in our case) instances of each class. This is not pretty to have such an error thrown. How should we handle this?