Lightning-AI / tutorials

Collection of Pytorch lightning tutorial form as rich scripts automatically transformed to ipython notebooks.
https://lightning-ai.github.io/tutorials
Apache License 2.0
289 stars 82 forks source link

Finetune Transformers Models with PyTorch Lightning: documentation error? #139

Open yfeng24816 opened 2 years ago

yfeng24816 commented 2 years ago

When calculating the total steps, shouldn't we use number of batches * epoch size ? In this case, it would be self.total_steps = (len(train_loader.dataset) // tb_size) * ab_size instead of self.total_steps = (len(train_loader.dataset) // tb_size) // ab_size.

Please fix me if anywhere is wrong.

image

https://pytorchlightning.github.io/lightning-tutorials/notebooks/lightning_examples/text-transformers.html

cc @borda @rohitgr7

rohitgr7 commented 2 years ago

I guess it should be batches * num_epochs but why would it be * ab_size?

yfeng24816 commented 2 years ago

Is ab_size something like num_epochs? It becomes self.trainer.max_epochs when accumulate_grad_batches is 1.

rohitgr7 commented 2 years ago

okay. yes... I didn't see max_epochs there. should be something like

total = (total / accumulation_factor)*max_epochs.
yfeng24816 commented 2 years ago

So do you also think there is an error in the documentation too? I am not sure by my own.

Borda commented 4 months ago

@stancld mind have a look, please? :chipmunk:

stancld commented 4 months ago

@Borda Yes, I will have a look on Friday (tmrw). Can you assign the issue to me, please? :]

stancld commented 4 months ago

It looks like no such example is present in tutorial anymore 🤔

Borda commented 4 months ago

It looks like no such example is present in tutorial anymore 🤔

https://github.com/Lightning-AI/tutorials/tree/main/lightning_examples/text-transformers

stancld commented 4 months ago

Oh sorry, I have forked tutorials repo, but it's pytorch one :D

stancld commented 4 months ago

@Borda I checked the notebook and it looks like the calculation of the total number of training steps is now the responsibility of a Lightning Trainer. The reported error is, therefore, no more relevant for this example, and I believe the issue can be closed/marked as done.