catalyst-team / catalyst

Accelerated deep learning R&D
https://catalyst-team.com
Apache License 2.0
3.3k stars 388 forks source link

Unscale gradients before gradient clipping #1319

Closed ivan-chai closed 2 years ago

ivan-chai commented 3 years ago

🐛 Bug Report

OptimizerCallback doesn't handle AMP correctly. Gradients need to be unscaled before clipping.

Original documentation is here:

https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping

Catalyst 21.09 code:

    def on_batch_end(self, runner: "IRunner"):
        """Event handler."""
        if runner.is_train_loader:
            self._accumulation_counter += 1
            need_gradient_step = self._accumulation_counter % self.accumulation_steps == 0

            loss = runner.batch_metrics[self.metric_key]
            runner.engine.backward_loss(loss, self.model, self.optimizer)

            if self.grad_clip_fn is not None:
                self.grad_clip_fn(self.model.parameters())

            if need_gradient_step:
                runner.engine.optimizer_step(loss, self.model, self.optimizer)
                runner.engine.zero_grad(loss, self.model, self.optimizer)

        runner.batch_metrics.update(self._get_lr_momentum_stats())
Scitator commented 3 years ago

That's interesting 👍

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Scitator commented 2 years ago

working :in-progress: