Maybe `drop_last` should be set as False in default? - Githubissues

dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

https://dreamquark-ai.github.io/tabnet/

MIT License

2.55k stars 470 forks source link

Maybe `drop_last` should be set as False in default? #537

Closed ChihchengHsieh closed 3 months ago

ChihchengHsieh commented 4 months ago

Describe the bug I set up a small dataset to have a test, but the training process behave really weird, showing that the loss is 0.0, and the weights are unchanged since epoch 1.

What is the current behavior?

epoch 85 | loss: 0.0 | 0:00:00s epoch 86 | loss: 0.0 | 0:00:00s epoch 87 | loss: 0.0 | 0:00:00s epoch 88 | loss: 0.0 | 0:00:00s epoch 89 | loss: 0.0 | 0:00:00s epoch 90 | loss: 0.0 | 0:00:00s epoch 91 | loss: 0.0 | 0:00:00s epoch 92 | loss: 0.0 | 0:00:00s epoch 93 | loss: 0.0 | 0:00:00s epoch 94 | loss: 0.0 | 0:00:00s epoch 95 | loss: 0.0 | 0:00:00s epoch 96 | loss: 0.0 | 0:00:00s epoch 97 | loss: 0.0 | 0:00:00s epoch 98 | loss: 0.0 | 0:00:00s epoch 99 | loss: 0.0 | 0:00:00s

If the current behavior is a bug, please provide the steps to reproduce.

Expected behavior

Screenshots

Other relevant information: poetry version:
python version: Operating System: Additional tools:

Additional context

After diving into your source code, I found it's because I have a small dataset size, which is smaller than default batch_size (1024). And, you also set the drop_last to True in default, resulting dropping the only batch I had. No error was showing when I encountered this issue. Setting it to Fasle in default may be more intuitive ? 👍

All the best,

Optimox commented 4 months ago

This is debatable indeed, here is the rationale behind this choice:

training a model on a dataset smaller than your batch size is not really recommended anyway so I don't see your case as a real concern
without drop_last=True, training on a dataset where N % batch_size == 1 would raise an error because of batch normalization that raises an error when given a batch of one. This behavior however seems like a legit concern, why would the code run with a dataset of size 10240 but not 10241 ?