dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
https://dreamquark-ai.github.io/tabnet/
MIT License
2.55k stars 470 forks source link

ValueError: y_true contains only one label (0). Please provide the true labels explicitly through the labels argument. #451

Closed SamruddhiMhatre closed 1 year ago

SamruddhiMhatre commented 1 year ago

I was trying to run the Kaggle notebook provided in the example link for TabNetMultiTaskClassifier in this repository. However, while fitting the model using the exact same code in the notebook, I get the following Value error::

Screenshot 2023-02-12 at 14 53 15

**--------------------------------------------------------------------------- ValueError Traceback (most recent call last)

in 3 X_train=X_train, y_train=y_train, 4 eval_set=[(X_valid, y_valid)], ----> 5 max_epochs=max_epochs 6 ) 7 /opt/conda/lib/python3.7/site-packages/pytorch_tabnet/abstract_model.py in fit(self, X_train, y_train, eval_set, eval_name, eval_metric, loss_fn, weights, max_epochs, patience, batch_size, virtual_batch_size, num_workers, drop_last, callbacks, pin_memory, from_unsupervised, warm_start, augmentations) 254 # Apply predict epoch to all eval sets 255 for eval_name, valid_dataloader in zip(eval_names, valid_dataloaders): --> 256 self._predict_epoch(eval_name, valid_dataloader) 257 258 # Call method on_epoch_end for all callbacks /opt/conda/lib/python3.7/site-packages/pytorch_tabnet/abstract_model.py in _predict_epoch(self, name, loader) 544 y_true, scores = self.stack_batches(list_y_true, list_y_score) 545 --> 546 metrics_logs = self._metric_container_dict[name](y_true, scores) 547 self.network.train() 548 self.history.epoch_metrics.update(metrics_logs) /opt/conda/lib/python3.7/site-packages/pytorch_tabnet/metrics.py in __call__(self, y_true, y_pred) 159 if isinstance(y_pred, list): 160 res = np.mean( --> 161 [metric(y_true[:, i], y_pred[i]) for i in range(len(y_pred))] 162 ) 163 else: /opt/conda/lib/python3.7/site-packages/pytorch_tabnet/metrics.py in (.0) 159 if isinstance(y_pred, list): 160 res = np.mean( --> 161 [metric(y_true[:, i], y_pred[i]) for i in range(len(y_pred))] 162 ) 163 else: /opt/conda/lib/python3.7/site-packages/pytorch_tabnet/metrics.py in __call__(self, y_true, y_score) 312 LogLoss of predictions vs targets. 313 """ --> 314 return log_loss(y_true, y_score) 315 316 /opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs) 70 FutureWarning) 71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)}) ---> 72 return f(**kwargs) 73 return inner_f 74 /opt/conda/lib/python3.7/site-packages/sklearn/metrics/_classification.py in log_loss(y_true, y_pred, eps, normalize, sample_weight, labels) 2198 raise ValueError('y_true contains only one label ({0}). Please ' 2199 'provide the true labels explicitly through the ' -> 2200 'labels argument.'.format(lb.classes_[0])) 2201 else: 2202 raise ValueError('The labels array needs to contain at least two ' ValueError: y_true contains only one label (0). Please provide the true labels explicitly through the labels argument.** **What does it mean by asking to explicitly provide the labels through labels argument? And is it possible to update the Kaggle notebook with script that aligns with the current version of TabNetMultiTaskClassifier? ---------------------------------------------------------------------------** Best, Samruddhi
Optimox commented 1 year ago

The error message seems quite clear to me. One of your task either has only 1s or 0s (on training or validation) so you can't compute the current metric which is logloss. As it's quite useless to monitor a binary score on a single value I would advise you to try to make a split that makes it work.

jadonzhou commented 1 year ago

Same issue here...

How this was solved? Thanks!

SamruddhiMhatre commented 1 year ago

The error message seems quite clear to me. One of your task either has only 1s or 0s (on training or validation) so you can't compute the current metric which is logloss. As it's quite useless to monitor a binary score on a single value I would advise you to try to make a split that makes it work.

Is it possible to oversample the data for multiple target variables? I was wondering if there's an efficient way to do so inorder to prepare the data for TabNetMultiTask Classifier?