Unscale gradients before gradient clipping

ivan-chai commented 3 years ago

🐛 Bug Report

OptimizerCallback doesn't handle AMP correctly. Gradients need to be unscaled before clipping.

Original documentation is here:

https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping

Catalyst 21.09 code:

    def on_batch_end(self, runner: "IRunner"):
        """Event handler."""
        if runner.is_train_loader:
            self._accumulation_counter += 1
            need_gradient_step = self._accumulation_counter % self.accumulation_steps == 0

            loss = runner.batch_metrics[self.metric_key]
            runner.engine.backward_loss(loss, self.model, self.optimizer)

            if self.grad_clip_fn is not None:
                self.grad_clip_fn(self.model.parameters())

            if need_gradient_step:
                runner.engine.optimizer_step(loss, self.model, self.optimizer)
                runner.engine.zero_grad(loss, self.model, self.optimizer)

        runner.batch_metrics.update(self._get_lr_momentum_stats())

Scitator commented 3 years ago

That's interesting 👍

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Scitator commented 2 years ago

working :in-progress:

catalyst-team / catalyst

Unscale gradients before gradient clipping #1319

🐛 Bug Report