Closed jevonswang closed 5 years ago
We used P100 to train our models.
What is your cfg file?
We used P100 to train our models. What is your cfg file?
My cfg is w32_512_adam_lr1e-3.yaml, where IMAGES_PER_GPU is modified to 1. P100 has 16GB memory while 1080Ti has only 11GB. I'm trying to cut down channels to reduce the memory needed. Thanks for your relpy.
Sorry, I made a mistake. Although I set CUDA_VISIBLE_DEVICES=3, but MULTIPROCESSING_DISTRIBUTED is true, so it still run at the first GPU, which is already occupied by others.
@jevonswang Hi, how to fix it? I met same error. My env is 2*1080Ti. Even though setting train_batch=1 and num_worker=2, it is still out of memory. Update info: The problem may be caused by invalid modification to cfg. I modify TRAIN.IMAGES_PER_GPU, but images_per_batch in make_dataloader (build.py) is like used to be.
Hi, What kind of GPU do you use? I trained with 1080Ti, but got "CUDA out of memory" error, even the batchsize is 1 and FP16 enabled. What should I do next to train my own model?