Tramac / awesome-semantic-segmentation-pytorch

Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet)
Apache License 2.0
2.82k stars 581 forks source link

KeyError: 'local_rank' #110

Open srcn9595 opened 4 years ago

srcn9595 commented 4 years ago

Hi, "python3 demo.py --model fcn32s_vgg16_voc --input-pic ./datasets/test.png" running.

Error Code:

Traceback (most recent call last):
  File "demo.py", line 58, in <module>
    demo(args)
  File "demo.py", line 44, in demo
    model = get_model(args.model, pretrained=True, root=args.save_folder).to(device)
  File "/home/sercan/awesome-semantic-segmentation-pytorch/core/models/model_zoo.py", line 90, in get_model
    net = _models[name](**kwargs)
  File "/home/sercan/awesome-semantic-segmentation-pytorch/core/models/fcn.py", line 209, in get_fcn32s_vgg16_voc
    return get_fcn32s('pascal_voc', 'vgg16', **kwargs)
  File "/home/sercan/awesome-semantic-segmentation-pytorch/core/models/fcn.py", line 164, in get_fcn32s
    device = torch.device(kwargs['local_rank'])
KeyError: 'local_rank'

Why am I getting such an error? Can you help me ?

YLONl commented 4 years ago

i got the same error.did you solve the problem

miaodd98 commented 4 years ago

I have the same issue when I was trying to evaluate the fcn model with resnet50 backbone. Weird thing is that 'local_rank' is not an argument for running, but when trying to load the model it just appears. Hoping to solve this issue.

miaodd98 commented 4 years ago

Debugging the program seems not find any runtime errors, so weird.

miaodd98 commented 4 years ago

It seems that this error would happen on resnet backbone, weird thing that there is a value at local_rank parameter which points out the current GPU device. Still don't know why this error happens.

YLONl commented 4 years ago

i get this error because of running demo.py . I run train.py without this error.

miaodd98 commented 4 years ago

There is no problem when training. I found this issue when running eval.py .

renmmmmmm commented 4 years ago

Found 1449 images in the folder /home/all/datasets/VOC/VOCdevkit/VOC2012 Traceback (most recent call last): File "eval.py", line 116, in evaluator = Evaluator(args) File "eval.py", line 51, in init norm_layer=BatchNorm2d).to(self.device)#args.local_rank, File "/home//awesome-semantic-segmentation-pytorch-master/core/models/model_zoo.py", line 124, in get_segmentation_model return modelsmodel File "/home//awesome-semantic-segmentation-pytorch-master/core/models/pspnet.py", line 132, in get_psp model = PSPNet(datasets[dataset].NUM_CLASS, backbone=backbone, pretrained_base=pretrained_base, kwargs) File "/home//awesome-semantic-segmentation-pytorch-master/core/models/pspnet.py", line 35, in init super(PSPNet, self).init(nclass, aux, backbone, pretrained_base=pretrained_base, kwargs) File "/home//awesome-semantic-segmentation-pytorch-master/core/models/segbase.py", line 26, in init self.pretrained = resnet50_v1s(pretrained=pretrained_base, dilated=dilated, kwargs) File "/home//awesome-semantic-segmentation-pytorch-master/core/models/base_models/resnetv1b.py", line 236, in resnet50_v1s model = ResNetV1b(BottleneckV1b, [3, 4, 6, 3], deep_stem=True, kwargs) TypeError: init() got an unexpected keyword argument 'local_rank'

sainatarajan commented 4 years ago

Instead of kwargs['local_rank'] in eval.py or demo.py, substitute it with 0 or 1 accordingly whether its cpu or cuda. So, that specific line becomes device= torch.device(0) or device= torch.device(1). Please close this issue if this works for you. It had worked for me.

xyry commented 4 years ago

I got the same error, then I modify the file like this, it works. I hope this will help you. https://github.com/xyry/awesome-semantic-segmentation-pytorch/pull/1/commits/05b7de785dd15e618ce82418619a358bf472ca01

morememes commented 4 years ago

Same error in demo.py, I modified demo.py like this.

model = get_model(args.model, pretrained=True, root=args.save_folder, local_rank=device).to(device) It works for me.

Yvette1993 commented 4 years ago

Instead of kwargs['local_rank'] in eval.py or demo.py, substitute it with 0 or 1 accordingly whether its cpu or cuda. So, that specific line becomes device= torch.device(0) or device= torch.device(1). Please close this issue if this works for you. It had worked for me.

Excuse me. When I set "local_rank = 0", It's to say only using GPU 0, but I get the ERROR like this: RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 7.79 GiB total capacity; 4.74 GiB already allocated; 1.72 GiB free; 4.87 GiB reserved in total by PyTorch). So what I can do if I want to using two gpus? Thanks.

My command like this: export NGPUS=2 python -m torch.distributed.launch --nproc_per_node=$NGPUS eval.py --model danet --backbone resnet50 --dataset citys --resume ./torch/models/danet_resnet50_citys_best_model.pth --batch-size 1 and this: python eval.py --model danet --backbone resnet50 --dataset citys --resume ./torch/models/danet_resnet50_citys_best_model.pth --batch-size 1

yyywxk commented 4 years ago

I also got the same error, then I modify the core/models/base_models/resnetv1b.py in line 95 like this: zero_init_residual=False, norm_layer=nn.BatchNorm2d, **kwargs):

Yvette1993 commented 4 years ago

I also got the same error, then I modify the core/models/base_models/resnetv1b.py in line 95 like this: zero_init_residual=False, norm_layer=nn.BatchNorm2d, **kwargs):

Excuse me. When I change it like you said, but I got same error. Do you change any code in eval.py or train.py file? or what's your eval command

yyywxk commented 4 years ago

I also got the same error, then I modify the core/models/base_models/resnetv1b.py in line 95 like this: zero_init_residual=False, norm_layer=nn.BatchNorm2d, **kwargs):

Excuse me. When I change it like you said, but I got same error. Do you change any code in eval.py or train.py file? or what's your eval command

The error of CUDA out of memory is not related to this change. It means that the GPU memory is not enough to deal with your images. Multi-GPU evaluating maybe help, or crop the image into a small size.

Yvette1993 commented 4 years ago

I also got the same error, then I modify the core/models/base_models/resnetv1b.py in line 95 like this: zero_init_residual=False, norm_layer=nn.BatchNorm2d, **kwargs):

Excuse me. When I change it like you said, but I got same error. Do you change any code in eval.py or train.py file? or what's your eval command

The error of CUDA out of memory is not related to this change. It means that the GPU memory is not enough to deal with your images. Multi-GPU evaluating maybe help, or crop the image into a small size.

I using Multi-GPU evaluating command like this: export NGPUS=2 python -m torch.distributed.launch --nproc_per_node=$NGPUS eval.py --model danet --backbone resnet50 --dataset citys --resume ./torch/models/danet_resnet50_citys_best_model.pth --batch-size 1 but meet same error, because it's only using gpus-1 actually. so I ask you whether changed other code, that can using two gpu simultaneously. Thanks.

chang0424 commented 3 years ago

@morememes ,it works for me,too.thank you very much!

liuqinglong110 commented 3 years ago

I got the same error, then I modify the file like this, it works. I hope this will help you. xyry@05b7de7

Thanks, it works!

dawang-11 commented 8 months ago

I got the same error, then I modify the file like this, it works. I hope this will help you. xyry@05b7de7

Traceback (most recent call last): File "eval.py", line 113, in evaluator.eval() File "eval.py", line 59, in eval self.metric.reset() AttributeError: 'Evaluator' object has no attribute 'metric'