IceClear / CLIP-IQA

[AAAI 2023] Exploring CLIP for Assessing the Look and Feel of Images
Other
342 stars 20 forks source link

Arise RuntimeError: CUDA out of memory for larget image #14

Open XianchaoZhang opened 1 year ago

XianchaoZhang commented 1 year ago

Hi IceClear, Thank you for your impressive work! I encounter OOM issue when handle large image with clipiqa_single_image_demo.py, such as 8M/12M pictures. I want to know is there any memery limited for the CLIP-IQA model. Or whether my env runtime had configuration issue?

Here's the detail output logs:

clipiqa) PS D:\code\CLIP-IQA> python .\demo\clipiqa_single_image_demo.py --file_path=../dataset/IMG_20230316_105935577.jpg Traceback (most recent call last): File ".\demo\clipiqa_single_image_demo.py", line 62, in main() File ".\demo\clipiqa_single_image_demo.py", line 38, in main output, attributes = restoration_inference(model, os.path.join(args.file_path), return_attributes=True) File "d:\code\clip-iqa\mmedit\apis\restoration_inference.py", line 79, in restoration_inference result = model(test_mode=True, data) File "C:\Users\A.conda\envs\clipiqa\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "C:\Users\A.conda\envs\clipiqa\lib\site-packages\mmcv\runner\fp16_utils.py", line 110, in new_func return old_func(args, kwargs) File "d:\code\clip-iqa\mmedit\models\restorers\basic_restorer.py", line 79, in forward return self.forward_test(lq, gt, kwargs) File "d:\code\clip-iqa\mmedit\models\restorers\clipiqa.py", line 182, in forward_test output, attribute_prob = self.generator(lq) File "C:\Users\A.conda\envs\clipiqa\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "d:\code\clip-iqa\mmedit\models\backbones\sr_backbones\coopclipiqa.py", line 314, in forward logits_per_image, logits_per_text = self.clip_model(image, self.tokenized_prompts[i].to(image.device), self.pos_embedding) File "C:\Users\A.conda\envs\clipiqa\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "d:\code\clip-iqa\mmedit\models\components\clip\model.py", line 373, in forward image_features = self.encode_image(image, pos_embedding) File "d:\code\clip-iqa\mmedit\models\components\clip\model.py", line 355, in encode_image return self.visual(image.type(self.dtype), pos_embedding=pos_embedding) File "C:\Users\A.conda\envs\clipiqa\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "d:\code\clip-iqa\mmedit\models\components\clip\model.py", line 158, in forward x = self.attnpool(x, return_token, pos_embedding) File "C:\Users\A.conda\envs\clipiqa\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "d:\code\clip-iqa\mmedit\models\components\clip\model.py", line 72, in forward x, _ = F.multi_head_attention_forward( File "C:\Users\A.conda\envs\clipiqa\lib\site-packages\torch\nn\functional.py", line 5101, in multi_head_attention_forward attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p) File "C:\Users\A.conda\envs\clipiqa\lib\site-packages\torch\nn\functional.py", line 4847, in _scaled_dot_product_attention attn = softmax(attn, dim=-1) File "C:\Users\A.conda\envs\clipiqa\lib\site-packages\torch\nn\functional.py", line 1680, in softmax ret = input.softmax(dim) RuntimeError: CUDA out of memory. Tried to allocate 8.86 GiB (GPU 0; 12.00 GiB total capacity; 9.59 GiB already allocated; 0 bytes free; 9.70 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Could u please give me some advice?

IceClear commented 1 year ago

Hi. I think sometimes resizing the image is necessary given that the input resolution is too large. Too large inputs may make the IQA process meaningless.

XianchaoZhang commented 1 year ago

Hi IceClear, Thanks for your reply! And why meaningless? Did this means that CLIP is only trained on 224x224, it is limited by the pretrainded model?