Out of Memory Issue with Tiled-Based Inference on Large Images

zxwxz commented 2 months ago

First and foremost, I would like to express my gratitude for the excellent work you have done.

I am currently facing an issue with tiled-based inference. While attempting to perform inference on a large image, I encounter an Out of Memory (OOM) error related to CUDA. It appears that the vae.encoder is not properly utilizing the tiled settings.

Could you provide guidance on how to enable tiled-based inference to avoid this issue? Any assistance or suggestions would be greatly appreciated.

Thank you for your time and support.

CSRuiXie commented 2 months ago

Thank you for your attention! Could you please provide more detailed error information, as well as the amount of CUDA memory available on your computer?

zxwxz commented 2 months ago

Due to privacy concerns, I have redacted personal information with "xxxxxx" in my error log. Additionally, I noticed that the following code snippet has been marked: https://github.com/NJU-PCALab/AddSR/blob/f7f4023d9e62253ced60f52456c34b5c8b1f0fb6/pipelines/pipeline_addsr.py#L215

input size: 4320x7680 Traceback (most recent call last): File "xxxxxx/AddSR-main/test_addsr.py", line 268, in main(args) File "xxxxxx/AddSR-main/test_addsr.py", line 217, in main image = pipeline( File "xxxxxx/AddSR-main/utils/vaehook.py", line 444, in wrapper ret = fn(*args, kwargs) File "xxxxxx/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "xxxxxx/AddSR-main/pipelines/pipeline_addsr.py", line 1005, in call latents_condition_image = self.vae.encode(image2-1).latent_dist.sample() File "xxxxxx/lib/python3.9/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper return method(self, args, kwargs) File "xxxxxx/lib/python3.9/site-packages/diffusers/models/autoencoder_kl.py", line 258, in encode h = self.encoder(x) File "xxxxxx/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "xxxxxx/lib/python3.9/site-packages/diffusers/models/vae.py", line 141, in forward sample = down_block(sample) File "xxxxxx/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "xxxxxx/lib/python3.9/site-packages/diffusers/models/unet_2d_blocks.py", line 1247, in forward hidden_states = resnet(hidden_states, temb=None, scale=scale) File "xxxxxx/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "xxxxxx/lib/python3.9/site-packages/diffusers/models/resnet.py", line 606, in forward hidden_states = self.norm1(hidden_states) File "xxxxxx/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "xxxxxx/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 273, in forward return F.group_norm( File "xxxxxx/lib/python3.9/site-packages/torch/nn/functional.py", line 2528, in group_norm return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 15.82 GiB (GPU 0; 47.51 GiB total capacity; 31.72 GiB already allocated; 14.11 GiB free; 31.79 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

CSRuiXie commented 2 months ago

I think the insufficient memory error is due to the high resolution of your images. When we perform four times upscaling on 128x128 images, it typically requires around 11GB of CUDA memory. However, your input size is 4320x7680, which is significantly larger.

NJU-PCALab / AddSR

Out of Memory Issue with Tiled-Based Inference on Large Images #8