torch.cuda.OutOfMemoryError

littlecar5 commented 6 months ago

How much video memory does this need? 24GB video memory is not enough？How can you reduce the need for video memory？ Thank you！

IanYeung commented 6 months ago

How much video memory does this need? 24GB video memory is not enough？How can you reduce the need for video memory？ Thank you！

You can try reduce the memory by using a smaller tiling size. But maybe 24GB is still not enough. Because the encoding/decoding/skip connection require a large amount of memory. Using simple decoder may solve the problem but it will influence the performance.

littlecar5 commented 6 months ago

Can I ask how much memory is needed to complete the training and how much to complete the test？ Thank you! Looking forward to your reply!

IanYeung commented 6 months ago

Can I ask how much memory is needed to complete the training and how much to complete the test？ Thank you! Looking forward to your reply!

Hi, we use A100 (80G) for training and testing.

DDDaxing commented 3 days ago

Can I ask how much memory is needed to complete the training and how much to complete the test？ Thank you! Looking forward to your reply!

Hi, we use A100 (80G) for training and testing.

@IanYeung Hi author, may I know your output size after SR? I have my input size of 270x480, and have already set upscale=1, it still OOM. My GPU has 96G. May I know any solutions for this?

error messages:

Segment shape:  torch.Size([5, 3, 270, 480])
Segment shape:  torch.Size([5, 3, 270, 480])
Segment shape:  torch.Size([5, 3, 270, 480])
Segment shape:  torch.Size([5, 3, 270, 480])
Segment shape:  torch.Size([5, 3, 270, 480])
Segment shape:  torch.Size([5, 3, 270, 480])
Sampling:   0%|                                                                                                                              | 0/30 [00:00<?, ?it/s]>>>>>>>>>>>>>>>>>>>>>>>
seq:  /data3/frames/555 seg:  0 size:  torch.Size([5, 3, 270, 480])
Sampling:   0%|                                                                                                                              | 0/30 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/data3/project/MGLD-VSR/scripts/vsr_val_ddpm_text_T_vqganfin_oldcanvas_tile.py", line 561, in <module>
    main()
  File "/data3/project/MGLD-VSR/scripts/vsr_val_ddpm_text_T_vqganfin_oldcanvas_tile.py", line 393, in main
    im_lq_bs = F.pad(im_lq_bs, pad=(0, pad_w, 0, pad_h), mode='reflect')
  File "/data3/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/functional.py", line 4552, in pad
    return torch._C._nn.pad(input, pad, mode, value)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

IanYeung / MGLD-VSR

torch.cuda.OutOfMemoryError #11