Open srcn9595 opened 4 years ago
i got the same error.did you solve the problem
I have the same issue when I was trying to evaluate the fcn model with resnet50 backbone. Weird thing is that 'local_rank' is not an argument for running, but when trying to load the model it just appears. Hoping to solve this issue.
Debugging the program seems not find any runtime errors, so weird.
It seems that this error would happen on resnet backbone, weird thing that there is a value at local_rank parameter which points out the current GPU device. Still don't know why this error happens.
i get this error because of running demo.py . I run train.py without this error.
There is no problem when training. I found this issue when running eval.py .
Found 1449 images in the folder /home/all/datasets/VOC/VOCdevkit/VOC2012
Traceback (most recent call last):
File "eval.py", line 116, in
Instead of kwargs['local_rank']
in eval.py
or demo.py
, substitute it with 0 or 1 accordingly whether its cpu or cuda. So, that specific line becomes device= torch.device(0)
or device= torch.device(1)
. Please close this issue if this works for you. It had worked for me.
I got the same error, then I modify the file like this, it works. I hope this will help you. https://github.com/xyry/awesome-semantic-segmentation-pytorch/pull/1/commits/05b7de785dd15e618ce82418619a358bf472ca01
Same error in demo.py, I modified demo.py like this.
model = get_model(args.model, pretrained=True, root=args.save_folder, local_rank=device).to(device)
It works for me.
Instead of
kwargs['local_rank']
ineval.py
ordemo.py
, substitute it with 0 or 1 accordingly whether its cpu or cuda. So, that specific line becomesdevice= torch.device(0)
ordevice= torch.device(1)
. Please close this issue if this works for you. It had worked for me.
Excuse me. When I set "local_rank = 0", It's to say only using GPU 0, but I get the ERROR like this: RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 7.79 GiB total capacity; 4.74 GiB already allocated; 1.72 GiB free; 4.87 GiB reserved in total by PyTorch). So what I can do if I want to using two gpus? Thanks.
My command like this: export NGPUS=2 python -m torch.distributed.launch --nproc_per_node=$NGPUS eval.py --model danet --backbone resnet50 --dataset citys --resume ./torch/models/danet_resnet50_citys_best_model.pth --batch-size 1 and this: python eval.py --model danet --backbone resnet50 --dataset citys --resume ./torch/models/danet_resnet50_citys_best_model.pth --batch-size 1
I also got the same error, then I modify the core/models/base_models/resnetv1b.py
in line 95 like this:
zero_init_residual=False, norm_layer=nn.BatchNorm2d, **kwargs):
I also got the same error, then I modify the
core/models/base_models/resnetv1b.py
in line 95 like this:zero_init_residual=False, norm_layer=nn.BatchNorm2d, **kwargs):
Excuse me. When I change it like you said, but I got same error. Do you change any code in eval.py or train.py file? or what's your eval command
I also got the same error, then I modify the
core/models/base_models/resnetv1b.py
in line 95 like this:zero_init_residual=False, norm_layer=nn.BatchNorm2d, **kwargs):
Excuse me. When I change it like you said, but I got same error. Do you change any code in eval.py or train.py file? or what's your eval command
The error of CUDA out of memory
is not related to this change. It means that the GPU memory is not enough to deal with your images. Multi-GPU evaluating maybe help, or crop the image into a small size.
I also got the same error, then I modify the
core/models/base_models/resnetv1b.py
in line 95 like this:zero_init_residual=False, norm_layer=nn.BatchNorm2d, **kwargs):
Excuse me. When I change it like you said, but I got same error. Do you change any code in eval.py or train.py file? or what's your eval command
The error of
CUDA out of memory
is not related to this change. It means that the GPU memory is not enough to deal with your images. Multi-GPU evaluating maybe help, or crop the image into a small size.
I using Multi-GPU evaluating command like this: export NGPUS=2 python -m torch.distributed.launch --nproc_per_node=$NGPUS eval.py --model danet --backbone resnet50 --dataset citys --resume ./torch/models/danet_resnet50_citys_best_model.pth --batch-size 1 but meet same error, because it's only using gpus-1 actually. so I ask you whether changed other code, that can using two gpu simultaneously. Thanks.
@morememes ,it works for me,too.thank you very much!
I got the same error, then I modify the file like this, it works. I hope this will help you. xyry@05b7de7
Thanks, it works!
I got the same error, then I modify the file like this, it works. I hope this will help you. xyry@05b7de7
Traceback (most recent call last):
File "eval.py", line 113, in
Hi, "
python3 demo.py --model fcn32s_vgg16_voc --input-pic ./datasets/test.png
" running.Error Code:
Why am I getting such an error? Can you help me ?