catalyst-team / catalyst

Accelerated deep learning R&D
https://catalyst-team.com
Apache License 2.0
3.29k stars 388 forks source link

Bug in catalyst/callbacks/backward.py if the grad_clip_fn value is set. #1445

Closed AleksandrMinin closed 1 year ago

AleksandrMinin commented 1 year ago

πŸ› Bug Report

Bug in catalyst/callbacks/backward.py if the grad_clip_fn value is set.

How To Reproduce

Steps to reproduce the behavior:

  1. Create a callback with a BackwardCallback in which grad_clip_fn is not empty.
  2. Launch runner.train with this callback.
  3. The output will be an error:
    
    /python_envs/kaggle-env/lib/python3.8/site-packages/catalyst/callbacks/backward.py:55                                                                                                 
    52 β”‚   β”‚   β”‚   
    53 β”‚   β”‚   β”‚   if self.grad_clip_fn is not None:
    54 β”‚   β”‚   β”‚   β”‚   runner.engine.unscale_gradients()
    -->55 β”‚   β”‚   β”‚   β”‚   norm = self.grad_clip_fn(self.model.parameters())
    56 β”‚   β”‚   β”‚   β”‚   if self._log_gradient:
    57 β”‚   β”‚   β”‚   β”‚   β”‚   runner.batch_metrics[f"{self._prefix_gradient}/norm"] = norm
    58                                                                                             

AttributeError: 'BackwardCallback' object has no attribute 'model'


#### Code sample
```python
import torch
from torch.nn.utils import clip_grad_norm_
from catalyst import dl
from catalyst.core.callback import Callback
from catalyst.engines.torch import CPUEngine, GPUEngine

from src.config import config
from src.base_config import Config
from src.tools import set_global_seed, get_code
from src.dataset import get_loaders
from src.crnn import CRNN
from src.runners import SupervisedOCRRunner

callbacks= [     
    dl.CriterionCallback(
        input_key=dict(output="log_probs", output_size="input_lengths"),
        target_key=dict(target="targets", target_len="target_lengths"),     
        metric_key="loss",
        criterion_key="ctc_loss_fn",
    ),
    dl.BackwardCallback(
        metric_key="loss",
        grad_clip_fn=clip_grad_norm_,
        grad_clip_params={"max_norm": 0.5,
                          "norm_type": 2},   
    ),
]

loaders, infer_loader = get_loaders(config)  
model = CRNN(**config.model_kwargs)

optimizer = config.optimizer(params=model.parameters(), **config.optimizer_kwargs)
scheduler = config.scheduler(optimizer=optimizer, **config.scheduler_kwargs)

if torch.cuda.is_available():
    engine = GPUEngine()
else:
    engine = CPUEngine()

runner = SupervisedOCRRunner(
    input_key="image", 
    target_key="target", 
    output_key="output",
)

criterion = {"ctc_loss_fn": config.ctc_loss}

runner.train(
    model=model,
    engine=engine,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    callbacks=callbacks,
    num_epochs=config.n_epochs,
    valid_loader="valid",
    valid_metric=config.valid_metric,
    minimize_valid_metric=config.minimize_metric,
    seed=config.seed,
    verbose=True,
    load_best_on_end=True,
)

Expected behavior

You need to replace

norm = self.grad_clip_fn(self.model.parameters()) 

with

norm = self.grad_clip_fn(runner.model.parameters())

in catalyst/callbacks/backward.py line 55.

Then there will be no mistake and the training will be successful.

Environment

Catalyst version: 22.04
PyTorch version: 1.13.0+cu117
Is debug build: No
CUDA used to build PyTorch: 11.7
TensorFlow version: N/A
TensorBoard version: 2.9.1

OS: Ubuntu 20.04.3 LTS
GCC version: (Ubuntu 7.5.0-6ubuntu2) 7.5.0
CMake version: version 3.10.3

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA GeForce GTX 1080
GPU 1: NVIDIA GeForce GTX 1080

Nvidia driver version: 470.82.01
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] catalyst==22.4
[pip3] efficientnet-pytorch==0.7.1
[pip3] numpy==1.23.5
[pip3] pytorch-ignite==0.4.11
[pip3] segmentation-models-pytorch==0.3.2
[pip3] tensorboard==2.9.1
[pip3] tensorboard-data-server==0.6.1
[pip3] tensorboard-plugin-wit==1.8.1
[pip3] tensorboardX==2.5.1
[pip3] torch==1.13.0
[pip3] torchvision==0.14.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py39h7f8727e_0  
[conda] mkl_fft                   1.3.1            py39hd3c417c_0  
[conda] mkl_random                1.2.2            py39h51133e4_0  
[conda] numpy                     1.21.5           py39h6c91a56_3  
[conda] numpy-base                1.21.5           py39ha15fc14_3  
[conda] numpydoc                  1.4.0            py39h06a4308_0

Checklist

FAQ

Please review the FAQ before submitting an issue:

bagxi commented 1 year ago

Duplicate of https://github.com/catalyst-team/catalyst/issues/1444