awslabs / dgl-lifesci

Python package for graph neural networks in chemistry and biology
Apache License 2.0
696 stars 144 forks source link

ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required. #218

Closed seyonechithrananda closed 1 year ago

seyonechithrananda commented 1 year ago

Hi all,

I'm currently using the CSV data configuration example to do mulit-label binary classification (train_classification.py) on my dataset of choice. The dataset has many tasks and NaNs, but is featurizing properly. However, when it tries to compute ROC-AUC on the test set after 1 epoch, it runs into a ValueError when checking the y_true array. I've checked that all my columns/tasks are not just NaNs, such that y_true has some value. However, for some reason y_true has torch.Size([0]) for a specific column in my dataframe and so it is throwing a ValueError. Stack trace is provided below:

Dataframe task index: 498 Printout of y_true.shape: torch.Size([0])

Traceback (most recent call last): File "classification_train.py", line 219, in main(args, exp_config, train_set, val_set, test_set) File "classification_train.py", line 94, in main val_score = run_an_eval_epoch(args, model, val_loader) File "classification_train.py", line 56, in run_an_eval_epoch return np.mean(eval_meter.compute_metric(args['metric'])) File "/anaconda/envs/dgllife/lib/python3.8/site-packages/dgllife/utils/eval.py", line 342, in compute_metric return self.roc_auc_score(reduction) File "/anaconda/envs/dgllife/lib/python3.8/site-packages/dgllife/utils/eval.py", line 277, in roc_auc_score return self.multilabel_score(score, reduction) File "/anaconda/envs/dgllife/lib/python3.8/site-packages/dgllife/utils/eval.py", line 183, in multilabel_score task_score = score_func(task_y_true, task_y_pred) File "/anaconda/envs/dgllife/lib/python3.8/site-packages/dgllife/utils/eval.py", line 276, in score return roc_auc_score(y_true.long().numpy(), torch.sigmoid(y_pred).numpy()) File "/anaconda/envs/dgllife/lib/python3.8/site-packages/sklearn/metrics/_ranking.py", line 550, in roc_auc_score y_true = check_array(y_true, ensure_2d=False, dtype=None) File "/anaconda/envs/dgllife/lib/python3.8/site-packages/sklearn/utils/validation.py", line 931, in check_array raise ValueError( ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

Thanks for your help, Seyone

mufeili commented 1 year ago

Hi Seyone,

It might be possible that after the data split. Some columns in a subset are all NaNs.

seyonechithrananda commented 1 year ago

Understood, thanks for your help! Closing this