Closed olineumann closed 4 years ago
Hi! thanks for your contribution!, great first issue!
I looked over the code and found the issue. Maybe the wandb python API didn't accepted step as parameter in the log function in a version before. So the step was added to the metric dict (and not copied so affected other loggers).
Also I think that empty metric dicts could be skipped in the base logger. You can see my fix in the following commit: https://github.com/PyTorchLightning/pytorch-lightning/compare/master...olineumann:issue/wandb_global_step
@olineumann nice, mind send a PR, seems that is could be one-click only :]
@Borda Now wandb
is giving warnings wandb: WARNING Adding to old History rows isn't currently supported. Step 25 < 38
and not logging when I try to use the WandbLogger with k-fold cross-validation because there I am using the same instance of wandb_logger but using trainer.fit
multiple times for different train_dl and valid_dl. Since the step gets repeated in each case, it's not logging anything after the 1st fold is complete even though the log keys are completely different. For now, I have to create a different logger separately for each fold, but is there any other way around to make it work with the single instance.
I also noticed that 'epoch' appearing now after upgrading to the current version in the metric dict without me logging any 'epoch'. It comes from trainer.logging: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/logging.py#L69
Because of this the step=global_step and when I log something in my pytorch_lightning module with step=epoch it will crash because of the same reason. The only way to solve it is to pass a log dict in train_step and validation_step only containing 'step': epoch.
What would be the best solution? I think:
Sounds that useful/logical? I think this should work but I'm also tired. I can create a PR if wanted in the next days (Fr/Sa).
@olineumann I would prefer the second as we do not want to affect the other loggers
I just wanted to fix it and pulled the current master from GitHub. It seems to be fixed already. https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/loggers/wandb.py#L131
Is this issue solved? I am experiencing similar chart issues in Wandb:
🐛 Bug
The wandb logger adds a 'global_step' to the metric dict which appears in all other loggers (e.g. Tensorboard). Only the wandb logger is adding 'global_step' to metric and I think it is not necessary. Another side effect of that is, that 'global_step' is also added to empty dicts which then are logged and resulting to strange graphs like this:
or this
I also wrote a simple logger class to print out metrics. I got this output:
Also notice: I set max_epochs to 10 so expected to be 10 measurements. The last one is missing. But this could be handled in an other issue.
To Reproduce
Steps to reproduce the behavior:
Code sample
Important LightningModule Methods:
Training:
Expected behavior
Is 'global_step' needed in wandb logger? If so, it should not affect other loggers. Also if there is nothing to log (e.g. in training_step) the logger should log nothing.
Environment
Linux Arch Python 3.8.2 Pytorch 1.4.0 Pytorch_Lightning 0.7.3