CUDA out of memory - Githubissues

HRNet / HigherHRNet-Human-Pose-Estimation

This is an official implementation of our CVPR 2020 paper "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation" (https://arxiv.org/abs/1908.10357)

MIT License

1.35k stars 272 forks source link

CUDA out of memory #4

Closed jevonswang closed 5 years ago

jevonswang commented 5 years ago

Hi, What kind of GPU do you use? I trained with 1080Ti, but got "CUDA out of memory" error, even the batchsize is 1 and FP16 enabled. What should I do next to train my own model?

bowenc0221 commented 5 years ago

We used P100 to train our models.
What is your cfg file?

jevonswang commented 5 years ago

We used P100 to train our models. What is your cfg file?

My cfg is w32_512_adam_lr1e-3.yaml, where IMAGES_PER_GPU is modified to 1. P100 has 16GB memory while 1080Ti has only 11GB. I'm trying to cut down channels to reduce the memory needed. Thanks for your relpy.

jevonswang commented 5 years ago

Sorry, I made a mistake. Although I set CUDA_VISIBLE_DEVICES=3, but MULTIPROCESSING_DISTRIBUTED is true, so it still run at the first GPU, which is already occupied by others.

WinstonDeng commented 4 years ago

@jevonswang Hi, how to fix it? I met same error. My env is 2*1080Ti. Even though setting train_batch=1 and num_worker=2, it is still out of memory. Update info: The problem may be caused by invalid modification to cfg. I modify TRAIN.IMAGES_PER_GPU, but images_per_batch in make_dataloader (build.py) is like used to be.