Closed mRcSchwering closed 4 years ago
@mRcSchwering can you try 0.8.0?
Just tried on 0.8.1 (hope that's as good). Issue remains. E.g.
Epoch: 43 Step: 85 Batch size: 1171
Epoch: 43 Step: 85 Batch size: 1171
Epoch: 43 Step: 85 Batch size: 1171
Step: 85 LR: 1.4851e-04
Epoch: 43 Step: 86 Batch size: 1171
Epoch: 43 Step: 86 Batch size: 1171
Epoch: 43 Step: 86 Batch size: 1171
Epoch: 43 Step: 86 Batch size: 93
Epoch: 44 Step: 86 Batch size: 1171
Epoch: 44 Step: 86 Batch size: 1171
Epoch: 44 Step: 86 Batch size: 1171
Epoch: 44 Step: 86 Batch size: 1171
Epoch: 44 Step: 86 Batch size: 1171
@mRcSchwering mind check it with our latest master? π°
I guess this is solved by https://github.com/PyTorchLightning/pytorch-lightning/pull/2853 and i could reproduce the expected behavior, can you please confirm this? @mRcSchwering Running script to be compatible with current master
import os
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader, random_split
from torchvision.datasets import MNIST
from torchvision import transforms
import pytorch_lightning as pl
pl.seed_everything(666)
class MyModule(pl.LightningModule):
def __init__(self, hparams: dict):
super().__init__()
self.hparams = hparams
self.model = nn.Linear(28*28, 10)
def training_step_end(self, outputs: dict):
print(f'Epoch: {self.current_epoch} Step: {self.global_step} Batch size: {len(outputs["logits"])}')
return outputs
def on_before_zero_grad(self, optimizer: torch.optim.Optimizer):
current_lr = [d['lr'] for d in optimizer.param_groups][0]
print(f'Step: {self.global_step} LR: {current_lr:.4e}')
def train_dataloader(self):
return DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=1171, shuffle=False)
def forward(self, x):
return self.model(x)
def training_step(self, batch, batch_idx: int) -> dict:
inputs, targets = batch
logits = self.forward(inputs.view(inputs.size(0), -1))
loss = F.cross_entropy(logits, targets)
return {'loss': loss, 'logits': logits}
def configure_optimizers(self):
return torch.optim.Adam(self.model.parameters(), lr=3e-4)
def optimizer_step(self, epoch, batch_idx, optimizer, opt_idx, lambda_closure, using_native_amp, using_lbfgs):
# modify learning rate...
optimizer.step()
self.on_before_zero_grad(optimizer)
optimizer.zero_grad()
trainer = pl.Trainer(
max_steps=100,
max_epochs=int(1e6),
gpus=-1,
num_sanity_val_steps=0,
progress_bar_refresh_rate=0,
accumulate_grad_batches=7,
early_stop_callback=False)
model = MyModule({})
trainer.fit(model)
Output: LR is printed every 7 accumulate steps and also in the last batch. current_epoch
and global_step
too incremented.
Epoch: 1 Step: 8 Batch size: 1171
Epoch: 1 Step: 8 Batch size: 1171
Epoch: 1 Step: 8 Batch size: 1171
Epoch: 1 Step: 8 Batch size: 1171
Epoch: 1 Step: 8 Batch size: 1171
Epoch: 1 Step: 8 Batch size: 1171
Epoch: 1 Step: 8 Batch size: 1171
Step: 8 LR: 3.0000e-04
Step: 8 LR: 3.0000e-04
Epoch: 1 Step: 9 Batch size: 1171
Epoch: 1 Step: 9 Batch size: 1171
Epoch: 1 Step: 9 Batch size: 1171
Epoch: 1 Step: 9 Batch size: 1171
Epoch: 1 Step: 9 Batch size: 1171
Epoch: 1 Step: 9 Batch size: 1171
Epoch: 1 Step: 9 Batch size: 1171
Step: 9 LR: 3.0000e-04
Step: 9 LR: 3.0000e-04
Epoch: 1 Step: 10 Batch size: 1171
Epoch: 1 Step: 10 Batch size: 1171
Epoch: 1 Step: 10 Batch size: 1171
Epoch: 1 Step: 10 Batch size: 1171
Epoch: 1 Step: 10 Batch size: 1171
Epoch: 1 Step: 10 Batch size: 1171
Epoch: 1 Step: 10 Batch size: 1171
Step: 10 LR: 3.0000e-04
Step: 10 LR: 3.0000e-04
Epoch: 1 Step: 11 Batch size: 1171
Epoch: 1 Step: 11 Batch size: 1171
Epoch: 1 Step: 11 Batch size: 1171
Epoch: 1 Step: 11 Batch size: 1171
Epoch: 1 Step: 11 Batch size: 1171
Epoch: 1 Step: 11 Batch size: 1171
Epoch: 1 Step: 11 Batch size: 1171
Step: 11 LR: 3.0000e-04
Step: 11 LR: 3.0000e-04
Epoch: 1 Step: 12 Batch size: 1171
Epoch: 1 Step: 12 Batch size: 1171
Epoch: 1 Step: 12 Batch size: 1171
Epoch: 1 Step: 12 Batch size: 1171
Epoch: 1 Step: 12 Batch size: 1171
Epoch: 1 Step: 12 Batch size: 1171
Epoch: 1 Step: 12 Batch size: 1171
Step: 12 LR: 3.0000e-04
Step: 12 LR: 3.0000e-04
Epoch: 1 Step: 13 Batch size: 1171
Epoch: 1 Step: 13 Batch size: 1171
Epoch: 1 Step: 13 Batch size: 1171
Epoch: 1 Step: 13 Batch size: 1171
Epoch: 1 Step: 13 Batch size: 1171
Epoch: 1 Step: 13 Batch size: 1171
Epoch: 1 Step: 13 Batch size: 1171
Step: 13 LR: 3.0000e-04
Step: 13 LR: 3.0000e-04
Epoch: 1 Step: 14 Batch size: 1171
Epoch: 1 Step: 14 Batch size: 1171
Epoch: 1 Step: 14 Batch size: 1171
Epoch: 1 Step: 14 Batch size: 1171
Epoch: 1 Step: 14 Batch size: 1171
Epoch: 1 Step: 14 Batch size: 1171
Epoch: 1 Step: 14 Batch size: 1171
Step: 14 LR: 3.0000e-04
Step: 14 LR: 3.0000e-04
Epoch: 1 Step: 15 Batch size: 1171
Epoch: 1 Step: 15 Batch size: 1171
Epoch: 1 Step: 15 Batch size: 279
Step: 15 LR: 3.0000e-04
Step: 15 LR: 3.0000e-04
Epoch: 2 Step: 16 Batch size: 1171
Epoch: 2 Step: 16 Batch size: 1171
Epoch: 2 Step: 16 Batch size: 1171
Epoch: 2 Step: 16 Batch size: 1171
Epoch: 2 Step: 16 Batch size: 1171
Epoch: 2 Step: 16 Batch size: 1171
Epoch: 2 Step: 16 Batch size: 1171
Step: 16 LR: 3.0000e-04
Step: 16 LR: 3.0000e-04
Cool, thx. And I learned something new (seed_everything
)
so I guess this could be closed.
π Bug
global_step
andcurrent_epoch
do not match up anymore after more than 1 epoch when setting accumulate gradients greater 1. I think at the end of each epochoptimizer_step
(andon_before_zero_grad
) is not called in that case.To Reproduce
pl.LightningModule
that logscurrent_epoch
andglobal_step
in everytraining_step_end
.accumulate_grad_batches=7
in the trainerExpected behavior
current_epoch
gets incremented,global_step
gets incremented as wellActual behavior
global_step
increments with every batch, but not ifcurrent_epoch
get incrementedglobal_step
is basically missing every 3rd incrementCode sample
Below is basically what I have. I am adjusting the learning rate with every global step. The learning rate adjustment and each
training_step_end
call gets printed.Below is some sample output. You can see the end of the epoch where the last 93 samples are processed. Then,
current_epoch
increases, butglobal_step
does not increase. Additionally, the learning rate print is missing, soon_before_zero_grad
was not called.Environment