testing is not working on multiple gpus.

SaadAhmad376 commented 1 month ago

I have two 16GB RTX 4070 super. I and running """python3 Enhancement/test_from_dataset.py --opt Options/RetinexFormer_NTIRE.yml --weights pretrained_weights/NTIRE.pth --dataset NTIRE --self_ensemble""". but I cannot load the model due to low memory. The model loads on only one gpu even if i try to use both. I check the CUDA_VISIBLE_DEVICES (as mentioned in the testing code) and the output is:

export CUDA_VISIBLE_DEVICES=0,1

Following is the complete error: RuntimeError: CUDA out of memory. Tried to allocate 3.57 GiB (GPU 0; 15.60 GiB total capacity; 8.24 GiB already allocated; 3.36 GiB free; 8.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.

Hope you ca guide me here, thanks.

caiyuanhao1998 commented 1 month ago

Hi, thanks for your interest. It seems that you are using two occupied GPUs. Please clean the GPU first or use GPU with larger available memory. If you find our repo useful, please help us star it, thanks

SaadAhmad376 commented 1 month ago

I checked the gpus, both are fully available and clean. I even checked the available devices, the result was 2. I tried a dummy script to check if the torch in my environment to see if both gpus are working and they were good to go. The problem is when i run the testing script, it loads the model on only one gpu while both are made available for use.

caiyuanhao1998 / Retinexformer

testing is not working on multiple gpus. #98