Open yfeng24816 opened 2 years ago
I guess it should be batches * num_epochs
but why would it be * ab_size
?
Is ab_size
something like num_epochs
? It becomes self.trainer.max_epochs
when accumulate_grad_batches
is 1.
okay. yes... I didn't see max_epochs there. should be something like
total = (total / accumulation_factor)*max_epochs.
So do you also think there is an error in the documentation too? I am not sure by my own.
@stancld mind have a look, please? :chipmunk:
@Borda Yes, I will have a look on Friday (tmrw). Can you assign the issue to me, please? :]
It looks like no such example is present in tutorial anymore 🤔
It looks like no such example is present in tutorial anymore 🤔
https://github.com/Lightning-AI/tutorials/tree/main/lightning_examples/text-transformers
Oh sorry, I have forked tutorials
repo, but it's pytorch
one :D
@Borda I checked the notebook and it looks like the calculation of the total number of training steps is now the responsibility of a Lightning Trainer. The reported error is, therefore, no more relevant for this example, and I believe the issue can be closed/marked as done.
When calculating the total steps, shouldn't we use
number of batches * epoch size
? In this case, it would beself.total_steps = (len(train_loader.dataset) // tb_size) * ab_size
instead ofself.total_steps = (len(train_loader.dataset) // tb_size) // ab_size
.Please fix me if anywhere is wrong.
https://pytorchlightning.github.io/lightning-tutorials/notebooks/lightning_examples/text-transformers.html
cc @borda @rohitgr7