catalyst-team / catalyst

Accelerated deep learning R&D
https://catalyst-team.com
Apache License 2.0
3.3k stars 388 forks source link

Bug in catalyst/callbacks/backward.py if the grad_clip_fn value is set. #1444

Closed AleksandrMinin closed 1 year ago

AleksandrMinin commented 1 year ago

πŸ› Bug Report

Bug in catalyst/callbacks/backward.py if the grad_clip_fn value is set.

How To Reproduce

Steps to reproduce the behavior:

  1. Create a callback with a BackwardCallback in which grad_clip_fn is not empty.
  2. Launch runner.train with this callback.
  3. The output will be an error:
    
    /python_envs/kaggle-env/lib/python3.8/site-packages/catalyst/callbacks/backward.py:55                                                                                                 
    52 β”‚   β”‚   β”‚   
    53 β”‚   β”‚   β”‚   if self.grad_clip_fn is not None:
    54 β”‚   β”‚   β”‚   β”‚   runner.engine.unscale_gradients()
    -->55 β”‚   β”‚   β”‚   β”‚   norm = self.grad_clip_fn(self.model.parameters())
    56 β”‚   β”‚   β”‚   β”‚   if self._log_gradient:
    57 β”‚   β”‚   β”‚   β”‚   β”‚   runner.batch_metrics[f"{self._prefix_gradient}/norm"] = norm
    58                                                                                             

AttributeError: 'BackwardCallback' object has no attribute 'model'


#### Code sample
```python
import torch
from torch.nn.utils import clip_grad_norm_
from catalyst import dl
from catalyst.core.callback import Callback
from catalyst.engines.torch import CPUEngine, GPUEngine

from src.config import config
from src.base_config import Config
from src.tools import set_global_seed, get_code
from src.dataset import get_loaders
from src.crnn import CRNN
from src.runners import SupervisedOCRRunner

callbacks= [     
    dl.CriterionCallback(
        input_key=dict(output="log_probs", output_size="input_lengths"),
        target_key=dict(target="targets", target_len="target_lengths"),     
        metric_key="loss",
        criterion_key="ctc_loss_fn",
    ),
    dl.BackwardCallback(
        metric_key="loss",
        grad_clip_fn=clip_grad_norm_,
        grad_clip_params={"max_norm": 0.5,
                          "norm_type": 2},   
    ),
]

loaders, infer_loader = get_loaders(config)  
model = CRNN(**config.model_kwargs)

optimizer = config.optimizer(params=model.parameters(), **config.optimizer_kwargs)
scheduler = config.scheduler(optimizer=optimizer, **config.scheduler_kwargs)

if torch.cuda.is_available():
    engine = GPUEngine()
else:
    engine = CPUEngine()

runner = SupervisedOCRRunner(
    input_key="image", 
    target_key="target", 
    output_key="output",
)

criterion = {"ctc_loss_fn": config.ctc_loss}

runner.train(
    model=model,
    engine=engine,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    callbacks=callbacks,
    num_epochs=config.n_epochs,
    valid_loader="valid",
    valid_metric=config.valid_metric,
    minimize_valid_metric=config.minimize_metric,
    seed=config.seed,
    verbose=True,
    load_best_on_end=True,
)

Expected behavior

You need to replace

norm = self.grad_clip_fn(self.model.parameters()) 

with

norm = self.grad_clip_fn(runner.model.parameters())

in catalyst/callbacks/backward.py line 55.

Then there will be no mistake and the training will be successful.

Environment

Catalyst version: 22.04
PyTorch version: 1.13.0+cu117
Is debug build: No
CUDA used to build PyTorch: 11.7
TensorFlow version: N/A
TensorBoard version: 2.9.1

OS: Ubuntu 20.04.3 LTS
GCC version: (Ubuntu 7.5.0-6ubuntu2) 7.5.0
CMake version: version 3.10.3

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA GeForce GTX 1080
GPU 1: NVIDIA GeForce GTX 1080

Nvidia driver version: 470.82.01
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] catalyst==22.4
[pip3] efficientnet-pytorch==0.7.1
[pip3] numpy==1.23.5
[pip3] pytorch-ignite==0.4.11
[pip3] segmentation-models-pytorch==0.3.2
[pip3] tensorboard==2.9.1
[pip3] tensorboard-data-server==0.6.1
[pip3] tensorboard-plugin-wit==1.8.1
[pip3] tensorboardX==2.5.1
[pip3] torch==1.13.0
[pip3] torchvision==0.14.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py39h7f8727e_0  
[conda] mkl_fft                   1.3.1            py39hd3c417c_0  
[conda] mkl_random                1.2.2            py39h51133e4_0  
[conda] numpy                     1.21.5           py39h6c91a56_3  
[conda] numpy-base                1.21.5           py39ha15fc14_3  
[conda] numpydoc                  1.4.0            py39h06a4308_0

Checklist

FAQ

Please review the FAQ before submitting an issue:

github-actions[bot] commented 1 year ago

Hi! Thank you for your contribution! Please re-check all issue template checklists - unfilled issues would be closed automatically. And do not forget to join our slack for collaboration.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.