Closed guo-pu closed 1 year ago
bisenetv2_psv.py 配置文件信息:
cfg = dict( model_type='bisenetv2', n_cats=6, # n_classes num_aux_heads=4, lr_start=5e-3, weight_decay=5e-4, warmup_iters=1000, max_iter=150000, dataset='psv', im_root='./datasets/psv', train_im_anns='./datasets/psv/train.txt', val_im_anns='./datasets/psv/val.txt', scales=[0.25, 2.], cropsize=[600, 600], eval_crop=[600, 600], eval_scales=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], ims_per_gpu=8, # batchsize eval_ims_per_gpu=8, # batchsize use_fp16=True, use_sync_bn=True, respth='./res', )
检测数据集中信息如下: check_dataset_info.py 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2550/2550 [00:30<00:00, 84.90it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2550/2550 [00:16<00:00, 155.70it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2550/2550 [00:51<00:00, 49.92it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2550/2550 [02:14<00:00, 19.00it/s]
there are 2550 lines in ./datasets/psv/train.txt, which means 2550 image/label image pairs
max and min image shapes by area are: (600, 600), (600, 600) max and min image shapes by height are: (600, 600), (600, 600) max and min image shapes by width are: (600, 600), (600, 600)
we ignore label value of 255 in label images label values are within range of [0, 5] label values that are missing: [] ratios of each label value(from small to big, without ignored): [0.9748857037037038, 0.01653214705882353, 0.005505824618736384, 0.00045158823529411763, 0.0009825294117647059, 0.0016422069716775598]
pixel mean rgb: [0.4598615424836601, 0.45117859694989104, 0.4178958779956427] pixel std rgb: [0.22807834146953662, 0.2240104261501351, 0.21754879251373244]
请问上面出现的问题,可能是什么原因导致的呢,或如何解决呢
Please make sure cropsize
is divisible by 32, 600 is not a good choice, maybe you can use 608
.
I'll give it a try, thanks
Thanks, it works after changing the image resolution.
Good to know that your problem is solved. I am closing this.
你好,我在训练自己的数据集, 运行命令是:torchrun --nproc_per_node=1 tools/train_amp.py --config configs/bisenetv2_psv.py 出现如下报错 RuntimeError: The size of tensor a (75) must match the size of tensor b (76) at non-singleton dimension 3
详细报错信息如下: Traceback (most recent call last): File "tools/train_amp.py", line 210, in
main()
File "tools/train_amp.py", line 206, in main
train()
File "tools/train_amp.py", line 159, in train
logits, logits_aux = net(im)
File "/opt/conda/envs/park-net/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(input, kwargs)
File "/opt/conda/envs/park-net/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward
output = self._run_ddp_forward(*inputs, *kwargs)
File "/opt/conda/envs/park-net/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward
return module_to_run(inputs[0], kwargs[0])
File "/opt/conda/envs/park-net/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, *kwargs)
File "/guopu/BiSeNet-master/./lib/models/bisenetv2.py", line 335, in forward
feat_head = self.bga(feat_d, feat_s)
File "/opt/conda/envs/park-net/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(input, kwargs)
File "/guopu/BiSeNet-master/./lib/models/bisenetv2.py", line 277, in forward
left = left1 torch.sigmoid(right1)
RuntimeError: The size of tensor a (75) must match the size of tensor b (76) at non-singleton dimension 3
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 6516) of binary: /opt/conda/envs/park-net/bin/python
Traceback (most recent call last):
File "/opt/conda/envs/park-net/bin/torchrun", line 8, in
sys.exit(main())
File "/opt/conda/envs/park-net/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f( args, kwargs)
File "/opt/conda/envs/park-net/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/opt/conda/envs/park-net/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/opt/conda/envs/park-net/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/envs/park-net/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
输入的图片尺寸是600*600
请问可能是什么原因导致的呢,或如何解决呢