RuntimeError: CUDA out of memory. just on documented ape training of Linemod

ethnhe / PVN3D

Code for "PVN3D: A Deep Point-wise 3D Keypoints Hough Voting Network for 6DoF Pose Estimation", CVPR 2020

MIT License

488 stars 105 forks source link

RuntimeError: CUDA out of memory. just on documented ape training of Linemod #35

Closed pbeeson closed 3 years ago

pbeeson commented 4 years ago

As new person to the pytorch/6d-pose estimation world, I finally got PVD3D set up after some issues. When I try to just run the normal training python3 -m train.train_linemod_pvn3d --cls ape it very quickly uses up all my GPU memory. I've tried setting the batch size to 1 in the linemod/dataset_config/models_info.yml and in common.py, but I always get the exact same behavior. It's as if the actual batch size is computed elsewhere, and I don't have control over it. Any help you might provide is greatly appreciated.

ethnhe commented 4 years ago

First use nvidia-smi to view the memory usage of each GPU. You can then use CUDA_VISIBLE_DEVICES=1,2 to specify which GPU(s) to use. For example, running with CUDA_VISIBLE_DEVICES=1,2 python3 -m train.train_linemod_pvn3d --cls ape will use GPU 1, 2 for training.

pbeeson commented 4 years ago

Yeah, I only have 1 GPU. So ultimately, it looks like PVDNet is set up in a way that uses up my whole 4GB of GPU just on loading the data, no matter what match size I set. I realize that 4GB isn't ideal for training, but at the same time, I feel like ther should be a tradeoff in time versus batch size/memory usage that should be tune-able somewhere.

WW-0 commented 3 years ago

As new person to the pytorch/6d-pose estimation world, I finally got PVD3D set up after some issues. When I try to just run the normal training python3 -m train.train_linemod_pvn3d --cls ape it very quickly uses up all my GPU memory. I've tried setting the batch size to 1 in the linemod/dataset_config/models_info.yml and in common.py, but I always get the exact same behavior. It's as if the actual batch size is computed elsewhere, and I don't have control over it. Any help you might provide is greatly appreciated.

Have you solved the same problem

WW-0 commented 3 years ago

请问作者这个问题怎么解决

Ray0089 commented 3 years ago

请问作者这个问题怎么解决

加钱上大内存GPU (#^.^#)

pbeeson commented 3 years ago

I think this can close. My guess is that this is likely due to having only a 4GB GPU.