Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.51k stars 3.39k forks source link

[RFC] Remove `{running,accumulated}_loss` #9372

Closed carmocca closed 1 year ago

carmocca commented 3 years ago

Proposed refactoring or deprecation

Remove the following code: https://github.com/PyTorchLightning/pytorch-lightning/commit/a979944ea0e0b023e5a0a89b5695c259e61bd0cb

Motivation

The running loss is a running window of loss values returned by the training_step. It has been present since the very beginning of Lightning and has become legacy code.

Problems:

Alternative:

Pitch

Remove the code, I don't think there's anything to deprecate here.

cc @awaelchli @ananthsub


If you enjoy Lightning, check out our other projects! ⚡

SkafteNicki commented 3 years ago

@carmocca nothing yet, but just created PR https://github.com/PyTorchLightning/metrics/pull/506 in torchmetrics that implements simple aggregation metrics (sum, mean, max, min, cat) :]

ananthsub commented 3 years ago

+1 I'm in favor of this! Good find @carmocca !

awaelchli commented 3 years ago

I agree with the window accumulation for the regular loss, it's not really needed and the value is anyway not configurable. However, I don't think its good to ask users to log the loss manually just to appear in the progress bar. Since automatic optimization is the default and the loss needs to be returned, Lightning should show it in the progress bar. Please keep this. The progress bar callback could take care of that.

What will we do with the loss accumulation for gradient accumulation phases? Will you also remove that?

tchaton commented 3 years ago

Yes, sounds like a great idea !

carmocca commented 3 years ago

I don't think its good to ask users to log the loss manually just to appear in the progress bar.

Almost all users (if not all?) are logging the loss explicitly already to include it in the loggers.

Since automatic optimization is the default and the loss needs to be returned, Lightning should show it in the progress bar

The progress bar and the concept of automatic optimization don't need to be linked like this. Also, it raises the question: "what about manual? do I need to return the loss there too?" On the other hand, there's a clear relation between "logging" and showing the progress bar, given the presence of log(prog_bar=bool).

This goes with the theme of avoiding multiple ways to do the same thing.

The progress bar callback could take care of that.

It does take care of it already by getting values from progress_bar_metrics which are managed by the LoggerConnector.

What will we do with the loss accumulation for gradient accumulation phases? Will you also remove that?

Yes, the point is to show in the progress bar the same that the users will see when they open TensorBoard, whatever that is.

jinyoung-lim commented 3 years ago

i can take a look at this issue if no one has started yet.

carmocca commented 3 years ago

After some offline discussion, we decided to split this into separate PRs:

  1. (Agreed) Remove the running accumulation mechanism. The difference in how this average is computed becomes confusing for users when they compare it to the values they logged, we should probably use an average metric instead.
  2. (Up for debate): Remove the automatic inclusion of the loss in the progress bar by returning it from training_step.

The main arguments for (2) are:

Now, if (2) is approved, there are things we could do to improve the experience: a. if the user logs something with "loss" in the key and prog_bar=True, then we disable the return mechanism b. if the user logs something with "loss" in the key and prog_bar=False, we print an info message (only once) suggesting to set prog_bar=True

rohitgr7 commented 3 years ago

b. if the user logs something with "loss" in the key and prog_bar=False, we print an info message (only once) suggesting to set prog_bar=True

I think an info msg is not required. The user chooses not to show it in the prog_bar, so it seems reasonable that they don't want it.