LiheYoung / Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
https://depth-anything.github.io
Apache License 2.0
6.34k stars 481 forks source link

OOM for Semantic Segmentation Finetuning #197

Open ShenZheng2000 opened 1 week ago

ShenZheng2000 commented 1 week ago

I followed the instructions provided here to fine-tune semantic segmentation on custom images. Despite using an RTX 4090 with 24 GB of VRAM, reducing the crop_size to 128x128, and using batch_size of 1, I still encounter an OutOfMemoryError below.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.00 GiB (GPU 0; 23.64 GiB total capacity; 21.35 GiB already allocated; 556.81 MiB free; 22.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is there a simple way to further reduce memory usage? Alternatively, could the author provide semantic segmentation checkpoints for smaller models, such as Ours-S or Ours-B, instead of the larger Ours-L model?

LiheYoung commented 1 week ago

According to the error message "If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation", you may try to change the max_split_size_mb?

ShenZheng2000 commented 1 week ago

I changed max_split_size_mb to 32 using export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32. However, the issue persists, even with an reduced crop_size of (128, 128).

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.00 GiB (GPU 0; 23.64 GiB total capacity; 21.76 GiB already allocated; 796.81 MiB free; 22.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Could you please let me know the GPU (and its memory size) you used for fine-tuning the model on the Cityscapes dataset? My 24GB GPU might be insufficient for reproducing the experiment, so I might need a smaller pre-trained model for fine-tuning."