Describe the bug
I set up a small dataset to have a test, but the training process behave really weird, showing that the loss is 0.0, and the weights are unchanged since epoch 1.
After diving into your source code, I found it's because I have a small dataset size, which is smaller than default batch_size (1024). And, you also set the drop_last to True in default, resulting dropping the only batch I had. No error was showing when I encountered this issue. Setting it to Fasle in default may be more intuitive ? 👍
This is debatable indeed, here is the rationale behind this choice:
training a model on a dataset smaller than your batch size is not really recommended anyway so I don't see your case as a real concern
without drop_last=True, training on a dataset where N % batch_size == 1 would raise an error because of batch normalization that raises an error when given a batch of one. This behavior however seems like a legit concern, why would the code run with a dataset of size 10240 but not 10241 ?
Describe the bug I set up a small dataset to have a test, but the training process behave really weird, showing that the loss is 0.0, and the weights are unchanged since epoch 1.
What is the current behavior?
epoch 85 | loss: 0.0 | 0:00:00s epoch 86 | loss: 0.0 | 0:00:00s epoch 87 | loss: 0.0 | 0:00:00s epoch 88 | loss: 0.0 | 0:00:00s epoch 89 | loss: 0.0 | 0:00:00s epoch 90 | loss: 0.0 | 0:00:00s epoch 91 | loss: 0.0 | 0:00:00s epoch 92 | loss: 0.0 | 0:00:00s epoch 93 | loss: 0.0 | 0:00:00s epoch 94 | loss: 0.0 | 0:00:00s epoch 95 | loss: 0.0 | 0:00:00s epoch 96 | loss: 0.0 | 0:00:00s epoch 97 | loss: 0.0 | 0:00:00s epoch 98 | loss: 0.0 | 0:00:00s epoch 99 | loss: 0.0 | 0:00:00s
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
Screenshots
Other relevant information: poetry version:
python version: Operating System: Additional tools:
Additional context
After diving into your source code, I found it's because I have a small dataset size, which is smaller than default batch_size (1024). And, you also set the
drop_last
to True in default, resulting dropping the only batch I had. No error was showing when I encountered this issue. Setting it toFasle
in default may be more intuitive ? 👍All the best,