bahjat-kawar / ddrm

[NeurIPS 2022] Denoising Diffusion Restoration Models -- Official Code Repository
MIT License
583 stars 53 forks source link

RuntimeError: CUDA error: out of memory #15

Open Shaosifan opened 1 year ago

Shaosifan commented 1 year ago

Hi, I just run the following code from the README and get "RuntimeError: CUDA error: out of memory". The GPU is NVIDIA Quadro RTX 8000 with 48 GB. python main.py --ni --config imagenet_256.yml --doc imagenet --timesteps 20 --eta 0.85 --etaB 1 --deg sr4 --sigma_0 0.05

ERROR - main.py - 2023-01-04 15:05:00,845 - Traceback (most recent call last):
  File "E:/leisen-workspace/codelife/super-resolution/ddrm-master/main.py", line 164, in main
    runner.sample()
  File "E:\leisen-workspace\codelife\super-resolution\ddrm-master\runners\diffusion.py", line 163, in sample
    self.sample_sequence(model, cls_fn)
  File "E:\leisen-workspace\codelife\super-resolution\ddrm-master\runners\diffusion.py", line 310, in sample_sequence
    x, _ = self.sample_image(x, model, H_funcs, y_0, sigma_0, last=False, cls_fn=cls_fn, classes=classes)
  File "E:\leisen-workspace\codelife\super-resolution\ddrm-master\runners\diffusion.py", line 338, in sample_image
    etaB=self.args.etaB, etaA=self.args.eta, etaC=self.args.eta, cls_fn=cls_fn, classes=classes)
  File "E:\leisen-workspace\codelife\super-resolution\ddrm-master\functions\denoising.py", line 53, in efficient_generalized_steps
    et = model(xt, t)
  File "C:\ProgramData\Anaconda3\envs\transenet\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\transenet\lib\site-packages\torch\nn\parallel\data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "C:\ProgramData\Anaconda3\envs\transenet\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\leisen-workspace\codelife\super-resolution\ddrm-master\guided_diffusion\unet.py", line 657, in forward
    h = module(h, emb)
  File "C:\ProgramData\Anaconda3\envs\transenet\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\leisen-workspace\codelife\super-resolution\ddrm-master\guided_diffusion\unet.py", line 75, in forward
    x = layer(x, emb)
  File "C:\ProgramData\Anaconda3\envs\transenet\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\leisen-workspace\codelife\super-resolution\ddrm-master\guided_diffusion\unet.py", line 233, in forward
    self._forward, (x, emb), self.parameters(), self.use_checkpoint
  File "E:\leisen-workspace\codelife\super-resolution\ddrm-master\guided_diffusion\nn.py", line 139, in checkpoint
    return func(*inputs)
  File "E:\leisen-workspace\codelife\super-resolution\ddrm-master\guided_diffusion\unet.py", line 242, in _forward
    h = in_conv(h)
  File "C:\ProgramData\Anaconda3\envs\transenet\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\transenet\lib\site-packages\torch\nn\modules\conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\ProgramData\Anaconda3\envs\transenet\lib\site-packages\torch\nn\modules\conv.py", line 443, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Do you know why this error happens?

lshaw8317 commented 1 year ago

My CUDA out ot memory error was solved by going into the config yaml file and reducing the batch_size, although was a slightly different error. Running 68GB GPU.