dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
https://dreamquark-ai.github.io/tabnet/
MIT License
2.55k stars 470 forks source link

Incompatiblity of current round() method with pytorch tensors when performing early stopping #538

Closed EXJUSTICE closed 1 month ago

EXJUSTICE commented 3 months ago

Describe the bug TypeError: type Tensor doesn't define round method occurs when utilizing torch.nn.KLDivLoss() with tensors

I am currently evaluting the use of Pytorch on a kaggle challenge. I wanted to utilize KL-divergence as a loss metric, and hence decided to use the nn.KlDivLoss() function. The loss works as intended, but when paired with early stopping, we encounter a critical bug related to the use of the .round() method. What is the current behavior? The notebook encounters a critical error and is unable to execute early stopping. This is most likely due to the use of the standard .round() method with tensors, instead of the torch.round() method If the current behavior is a bug, please provide the steps to reproduce.

  1. Setup a TabNetClassifier.
  2. Add a non-0 patience value (say 5).
  3. Add in a custom eval metric based on nn.KLDivLoss()
  4. Run .fit() . To observe a success, set patience to 0.

Expected behavior Rounding is used to judge when early stopping should occur, so the cell should do so successfully

Screenshots Stack trace as follows

Cell In[90], line 52
     42 print(y_valid.shape)
     43 """
     44 Note that model.fit has a natural predict_prob within it, that returns output (n_samples,n_features).
     45 Since our custom KL function (or possibly pytorch as well), takes that output and then the true values
   (...)
     49 Some examples : [https://www.geeksforgeeks.org/how-to-convert-an-array-of-indices-to-one-hot-encoded-numpy-array/](https://www.geeksforgeeks.org/how-to-convert-an-array-of-indices-to-one-hot-encoded-numpy-array/%3C/span%3E)
     50 """
---> 52 model.fit(X_train,y_train,
     53       eval_set=[(X_valid, y_valid)],
     54       patience=15, max_epochs=5,
     55       eval_metric=['av_kl'] )
     58 model.save_model(f'TabNet_v{Version}_f{i}')
     59 # Check what predict proba returns

File /opt/conda/lib/python3.10/site-packages/pytorch_tabnet/abstract_model.py:273, in TabModel.fit(self, X_train, y_train, eval_set, eval_name, eval_metric, loss_fn, weights, max_epochs, patience, batch_size, virtual_batch_size, num_workers, drop_last, callbacks, pin_memory, from_unsupervised, warm_start, augmentations, compute_importance)
    270         break
    272 # Call method on_train_end for all callbacks
--> 273 self._callback_container.on_train_end()
    274 self.network.eval()
    276 if self.compute_importance:
    277     # compute feature importance once the best model is defined

File /opt/conda/lib/python3.10/site-packages/pytorch_tabnet/callbacks.py:92, in CallbackContainer.on_train_end(self, logs)
     90 logs = logs or {}
     91 for callback in self.callbacks:
---> 92     callback.on_train_end(logs)

File /opt/conda/lib/python3.10/site-packages/pytorch_tabnet/callbacks.py:168, in EarlyStopping.on_train_end(self, logs)
    163     print(msg)
    164 else:
    165     msg = (
    166         f"Stop training because you reached max_epochs = {self.trainer.max_epochs}"
    167         + f" with best_epoch = {self.best_epoch} and "
--> 168         + f"best_{self.early_stopping_metric} = {round(self.best_loss, 5)}"
    169     )
    170     print(msg)
    171 wrn_msg = "Best weights from best epoch are automatically used!"

TypeError: type Tensor doesn't define __round__ method
Optimox commented 3 months ago

Could you share your custom eval metric and custom loss? I think the only problem is that you are returning a tensor instead of a float, it should be easy to fix on your side.