HRNet / HRNet-Semantic-Segmentation

The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919
Other
3.13k stars 686 forks source link

RuntimeError: CUDA out of memory. Tried to allocate 900.00 MiB (GPU 0; 10.92 GiB total capacity; 7.83 GiB already allocated; 711.50 MiB free; 9.66 GiB reserved in total by PyTorch) #184

Open EricHuiK opened 4 years ago

GutlapalliNikhil commented 3 years ago

Found any solution for it? like Do we need to change any prams to solve it?

hieunm1821 commented 3 years ago

I think you should reduce the batch-size

NikhilChowdary-MCW commented 3 years ago

@hieunm1821 , Yeah, we can reduce batch size or training resolution. Both cases will work.

Mps24-7uk commented 3 years ago

@EricHuiK Do you get solution

A-Kerim commented 2 years ago

Go to the .yaml in experiments/[dataset name]/..yaml file and update "BATCH_SIZE_PER_GPU" to a lower value. Then, run it as python -m torch.distributed.launch --nproc_per_node=1 tools/train.py ...

Arshadoid commented 1 year ago

Go to the .yaml in experiments/[dataset name]/..yaml file and update "BATCH_SIZE_PER_GPU" to a lower value. Then, run it as python -m torch.distributed.launch --nproc_per_node=1 tools/train.py ...

Hi @A-Kerim, I'm trying to run training on a single GPU on Windows 11 (just to see if it's running) and getting OutOfMemoryError. I already reduced the size of batch to 2. Still getting the same error. Would you have any suggestions how to solve this? Thanks!

GutlapalliNikhil commented 1 year ago

@Arshadoid Give it a try with batch size 1.

Arshadoid commented 1 year ago

@Arshadoid Give it a try with batch size 1.

Hi @GutlapalliNikhil thanks for the suggestion. I get an error about BatchNorm when try size of 1.