Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.3k stars 3.38k forks source link

Proper support for Pytorch SequentialLR Scheduler #10759

Open marcm-ml opened 2 years ago

marcm-ml commented 2 years ago

🐛 Bug

Currently there is a bug when a ReduceLROnPlateau is used inside SequentialLR due to no proper support for this scheduler in Trainer._configure_schedulers. An exception is raised since the monitor metric is not properly passed to the ReduceLROnPlateau scheduler in TrainingEpochLoop._update_learning_rates

Note: Currently, there is a bug in SequentialLR missing an optimizer attribute, see https://github.com/pytorch/pytorch/pull/67406 and https://github.com/PyTorchLightning/pytorch-lightning/issues/10278. But that should not interfere here.

To Reproduce

run any lightning model with trainer with scheduler setup like:

def configure_optimizers(self):
    optimizer = torch.optim.Adam(self.parameters(), lr=0.01)
    s1= torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer)
    s2= torch.optim.lr_scheduler.ConstantLR(optimizer)
    scheduler = torch.optim.lr_scheduler.SequentialLR(
        optimizer,
        schedulers=[s1, s2],
        milestones=[2]
    )
    scheduler.optimizer = optimizer  # dirty fix for bug in SequentialLR
    return {"optimizer": optimizer,
            "lr_scheduler": scheduler,
            "monitor": "loss"}

Expected behavior

The monitor value should be passed to the underlying ReduceLROnPlateau scheduler.

This is defenitely tricky to achieve as the current way is assuming a fixed scheduler setup for the entire training time, e.g. allows for multiple scheduler but if scheduler1 is changing midway it only works if it is not ReduceLROnPlateau.

Environment

Additional context

cc @tchaton

marcm-ml commented 2 years ago

Related to this but for another issue: There is no support for custom LRScheduler afaik unless they are inherited from _LRScheduler AND require no extra arguments when .step() is called.

In vanilla pytorch this is "solved" as the user is calling .step() directly.

Perhaps the newly introduced customizable loops is a solution for this?

awaelchli commented 2 years ago

Perhaps the newly introduced customizable loops is a solution for this?

I would rather go with "Manual Optimization" before going for custom loops.

In manual optimization mode, the user calls the scheduler step manually, hence you can use a custom one and pass args to its step method. How does that sound for a (temporary?) workaround?

marcm-ml commented 2 years ago

great idea, i forgot about that. I guess that solves the custom scheduler issue. But I still think the support for SequentialLR is necessary. I think there is a debate on Pytorch whether the schedule class needs a rewrite, so perhaps one waits for that?

rohitgr7 commented 2 years ago

For the main issue regarding ReduceLROnPlateau, SequentialLR doesn't support an additional argument inside .step so we really can't do anything from our side. https://github.com/pytorch/pytorch/blob/5fdcc20d8d96a6b42387f57c2ce331516ad94228/torch/optim/lr_scheduler.py#L628

Related to this but for another issue: There is no support for custom LRScheduler afaik unless they are inherited from _LRScheduler AND require no extra arguments when .step() is called.

Do you have any scheduler that is inherited from _LRScheduler and requires an additional argument inside the .step method? I have a PR for it: https://github.com/PyTorchLightning/pytorch-lightning/pull/10249, that will solve the issue but seems to be blocked under the decision whether to support outside schedulers or not which are not inherited from PT _LRScheduler.

marcm-ml commented 2 years ago

Well, I wanted to use WarmStart (custom scheduler inherited from _LRScheduler) together with ReduceLROnPlateu. First try was using SequentialLR but as outlined above is impossible to use with ReduceLROnPlateu. Then I simply inherited from ReduceLROnPlateu and modified its step function to include the logic from WarmStart. See here for the code. That works now with PytorchLightning as it is simply a ReduceLROnPlateu in disguise. Although it would be cleaner to use SequentialLR but is now a pytorch issue rather than PL. Maybe a separate issue can be opened there such that SequentialLR can accept arbitrary arguments in step?

rohitgr7 commented 2 years ago

yeah, I'd suggest opening an issue in PyTorch and linking it here. We won't close this issue. Once they support it, we will make adjustments here, if required, to make it compatible.

abbas695 commented 5 months ago

@marcm-ml hello , would you please share the code again as the link is now dead . i am trying my hardest to incorporate warm up with reducelronplateu but no success so far

marcm-ml commented 5 months ago

Sure but not guarantees that it works with any of the recent PyTorch or PytorhLightning versions. I haven't touched that in years.


import math
import warnings

import numpy as np
from torch.optim.lr_scheduler import ReduceLROnPlateau, _LRScheduler

class WarmStartReduceOnPlateau(ReduceLROnPlateau):
    def __init__(self,
                 optimizer,
                 warm_start: float,
                 warm_stop: float,
                 warm_patience: int = 0,
                 warm_duration: int = 25,
                 warm_type: str = "linear",
                 mode: str = "min",
                 patience: int = 10,
                 cooldown=0,
                 factor=0.1,
                 threshold=1e-4,
                 threshold_mode='rel',
                 min_lr=0,
                 eps=1e-8,
                 verbose=False):
        """
        Workaround class as SequentialLR with ReduceLROnPlateau is not working in pytorch lightning currently.
        Otherwise simply use WarmStart class together with any of the other pytorch schedulers.

        See Also
        https://github.com/PyTorchLightning/pytorch-lightning/issues/10759
        """
        assert warm_type in ("linear", "smooth")
        assert warm_duration > 0
        assert warm_patience >= 0

        self.warm_start = warm_start
        self.warm_stop = warm_stop
        self.warm_patience = warm_patience
        self.warm_duration = warm_duration
        self.warm_type = warm_type
        self.warm_ended = False
        self._last_lr = warm_start

        super().__init__(
            optimizer,
            mode=mode,
            factor=factor,
            patience=patience,
            threshold=threshold,
            threshold_mode=threshold_mode,
            cooldown=cooldown,
            min_lr=min_lr,
            eps=eps,
            verbose=verbose
        )

    def step(self, metrics, epoch=None):
        current = float(metrics)
        if epoch is None:
            epoch = self.last_epoch + 1
        self.last_epoch = epoch

        # Check if out of warm-up patience period and if warm-up should end
        if self.last_epoch > self.warm_patience and not self.warm_ended:
            self._warm_lr(self.last_epoch)

        # Check if out of warm-up phase
        if self.last_epoch > self.warm_patience + self.warm_duration:
            if self.is_better(current, self.best):
                self.best = current
                self.num_bad_epochs = 0
            else:
                self.num_bad_epochs += 1

            if self.in_cooldown:
                self.cooldown_counter -= 1
                self.num_bad_epochs = 0  # ignore any bad epochs in cooldown

            if self.num_bad_epochs > self.patience:
                # Indicate to warm up that LRReduce should happen; prevent LR override
                if self.verbose and not self.warm_ended:
                    print(f"Ending warm-up phase after {epoch} epochs. "
                          f"Switching over to ReduceLROnPlateau")
                self.warm_ended = True

                self._reduce_lr(epoch)
                self.cooldown_counter = self.cooldown
                self.num_bad_epochs = 0

        self._last_lr = [group['lr'] for group in self.optimizer.param_groups]

    def _warm_lr(self, epoch):
        for i, param_group in enumerate(self.optimizer.param_groups):
            old_lr = float(param_group['lr'])
            slope = (self.warm_stop - self.warm_start)
            x = (epoch - self.warm_patience) / self.warm_duration
            lower_bound = min(self.warm_start, self.warm_stop)
            upper_bound = max(self.warm_start, self.warm_stop)
            if self.warm_type == "linear":
                new_lr = slope * x + self.warm_start
            else:
                new_lr = slope * math.tanh(x) + self.warm_start
            param_group['lr'] = np.clip(new_lr, lower_bound, upper_bound)
            if self.verbose and not np.isclose(old_lr, new_lr):
                print('Epoch {:5d}: warming-up learning rate'
                      ' of group {} to {:.4e}.'.format(epoch, i, new_lr))
abbas695 commented 5 months ago

@marcm-ml it's working fantastically . thank you so much.