MedICL-VU / PRISM

[MICCAI 2024 Spotlight, Early Acceptance] PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts
Apache License 2.0
10 stars 5 forks source link

CUDA out of memory with 24G GPU #1

Open ax130885 opened 1 month ago

ax130885 commented 1 month ago

Hi, Sorry for replying so late.

In the ProMISe issue, it was mentioned that this project requires around 24GB of VRAM. However, I still encounter out-of-memory issue with RTX 3090 (24GB) while running the following command:

python src/train.py --data colon --data_dir "dataset/Task10_Colon" --save_name "my_train" --multiple_outputs --dynamic --use_box --refine

[11:35:04.209] Namespace(data='colon', save_dir='./implementation/colon/my_train', data_dir='dataset/Task10_Colon', num_workers=2, split='train', use_small_dataset=False, model_type='vit_b_ori', lr=4e-05, lr_scheduler='linear', warm_up=False, device='cuda:0', max_epoch=200, image_size=128, batch_size=1, checkpoint='best', checkpoint_sam='./checkpoint_sam/sam_vit_b_01ec64.pth', num_classes=2, tolerance=5, boundary_kernel_size=5, use_pretrain=False, pretrain_path='', resume=False, resume_best=False, ddp=False, gpu_ids=[0, 1], accumulation_steps=20, iter_nums=11, num_clicks=50, num_clicks_validation=10, use_box=True, dynamic_box=False, use_scribble=False, num_multiple_outputs=3, multiple_outputs=True, refine=True, no_detach=False, refine_test=False, dynamic=True, efficient_scribble=False, use_sam3d_turbo=False, save_predictions=False, save_csv=False, save_test_dir='./', save_name='my_train')
2024-08-15 11:35:04,209 - Namespace(data='colon', save_dir='./implementation/colon/my_train', data_dir='dataset/Task10_Colon', num_workers=2, split='train', use_small_dataset=False, model_type='vit_b_ori', lr=4e-05, lr_scheduler='linear', warm_up=False, device='cuda:0', max_epoch=200, image_size=128, batch_size=1, checkpoint='best', checkpoint_sam='./checkpoint_sam/sam_vit_b_01ec64.pth', num_classes=2, tolerance=5, boundary_kernel_size=5, use_pretrain=False, pretrain_path='', resume=False, resume_best=False, ddp=False, gpu_ids=[0, 1], accumulation_steps=20, iter_nums=11, num_clicks=50, num_clicks_validation=10, use_box=True, dynamic_box=False, use_scribble=False, num_multiple_outputs=3, multiple_outputs=True, refine=True, no_detach=False, refine_test=False, dynamic=True, efficient_scribble=False, use_sam3d_turbo=False, save_predictions=False, save_csv=False, save_test_dir='./', save_name='my_train')
Unet_encoder features: (32, 32, 64, 128, 384, 32).
Unet_decoder features: (32, 32, 64, 128, 384, 32).
dataloaders are created, models are loaded, and others are set, spent 3.8 for rank -1
num_clicks 50 points_length: 69189 dynamic_size: 13
First batch:   fn: 1.0000, fp: 0.0000, label 0: tensor(0), label 1: tensor(13)
--- ===================================== ---
--- above before model, below after model ---
--- ===================================== ---
dice before refine 0.07599078863859177 and after 0.037667643278837204
num_clicks 50 points_length: 356190 dynamic_size: 11
First batch:   fn: 0.8992, fp: 4.2488, label 0: tensor(11), label 1: tensor(0)
--- ===================================== ---
.
.
.
.
.
.
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 23.69 GiB total capacity; 21.36 GiB already allocated; 42.62 MiB free; 21.59 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

How can I resolve this problem?

HaoLi12345 commented 4 weeks ago

Hi,

reduce the iteration number would lower the GPU usage but lower the performance. I have tried 9 and 11, and 9 is slightly lower.

modify the network and input size could be other options if not using pertained model.

ax130885 commented 4 weeks ago

Thank you. After changing iter to 9, I can successfully execute the training program.

HaoLi12345 commented 4 weeks ago

glad to hear that