gitter-lab / ml4bio

A graphical interface for sklearn classification to introduce machine learning to biologists
MIT License
11 stars 2 forks source link

Catch Error when training on unlabeled data #15

Closed agitter closed 3 years ago

agitter commented 5 years ago

If a user loads one an unlabeled dataset, such as toy_data_1_unlabeled.csv, and tries to proceed to Step 2, the software crashes with the following Error:

Traceback (most recent call last):
  File "conda\envs\ml4bio\lib\site-packages\ml4bio\ml4bio.py", line 830, in set
    self.data.split(test_size, stratify)
  File "conda\envs\ml4bio\lib\site-packages\ml4bio\data.py", line 192, in split
    model_selection.train_test_split(integer_encoded_X, y, test_size=test_size, stratify=s, random_state=0)
  File "conda\envs\ml4bio\lib\site-packages\sklearn\model_selection\_split.py", line 2056, in train_test_split
    train, test = next(cv.split(X=arrays[0], y=stratify))
  File "conda\envs\ml4bio\lib\site-packages\sklearn\model_selection\_split.py", line 1204, in split
    for train, test in self._iter_indices(X, y, groups):
  File "conda\envs\ml4bio\lib\site-packages\sklearn\model_selection\_split.py", line 1546, in _iter_indices
    raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

ml4bio should catch this Error and explain the problem to the user.