Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.35k stars 3.38k forks source link

Training metrics not reported in log file #7899

Closed Borda closed 3 years ago

Borda commented 3 years ago

🐛 Bug

Forwarding issue found in Kaggle participation seems all training metrics are missing in CSVLogger

Please reproduce using the BoringModel

To Reproduce

https://github.com/Borda/kaggle_plant-pathology/issues/9

Expected behavior

have complete logging

Environment

Additional context

edgarriba commented 3 years ago

@Borda I created a minimal example to reproduce the issue but cannot find the configuration where the metrics are not logged

class CustomModel(BoringModel):
    acc = tm.Accuracy()
    def training_step(self, batch, batch_idx):
        val = torch.tensor(1.)
        self.log("train_acc", self.acc(val[None], val.long()[None]), prog_bar=False)
        self.log("train_prec", 1., prog_bar=False)
        self.log("train_f1", 1., prog_bar=True)
        return super().training_step(batch, batch_idx)

def test_integration(tmpdir):
    trainer = pl.Trainer(
        logger=CSVLogger(tmpdir),
        max_epochs=1,
    )
    model = CustomModel()
    trainer.fit(model)

    metrics = pd.read_csv(f'{trainer.logger.log_dir}/metrics.csv')
    print(metrics)

which produces the following output

metrics
   train_acc  train_prec  train_f1  epoch  step
0        1.0         1.0       1.0      0    49
Borda commented 3 years ago

I have another very simple model and still missing all train metrics https://github.com/Borda/kaggle_brain-tumor-3D/blob/main/kaggle_brain3d/models.py

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!