Closed xsu-yy closed 3 years ago
I got the same error. So anybody knows how to solve it?
This error seems to relate with batch_size.
Can you double check your configuration file, and make sure that DATALOADER.SIZE_DIVISIBILITY = 32 in https://github.com/amazon-research/siam-mot/blob/main/configs/dla/DLA_34_FPN_EMM.yaml#L38
This is because the feature size mismatch between different layers in DLA backbone when the image size is not divisible by 32.
Can you double check your configuration file, and make sure that DATALOADER.SIZE_DIVISIBILITY = 32 in https://github.com/amazon-research/siam-mot/blob/main/configs/dla/DLA_34_FPN_EMM.yaml#L38
This is because the feature size mismatch between different layers in DLA backbone when the image size is not divisible by 32.
Thank you,i have checked this parameter ,it is 32,and i have not modified。but I have solved this problem by change the batch size and i don't konw why this way can work.
hello ,i want to ask you for help! where i can change the batch size,since i do not see it in https://github.com/amazon-research/siam-mot/blob/main/configs/dla/DLA_34_FPN_EMM_MOT17.yaml
Thanks for the great works. when i train the model with MOT17 dataset by the following command:
python3 -m torch.distributed.launch --nproc_per_node=2 tools/train_net.py --config-file configs/dla/DLA_34_FPN_EMM_MOT17.yaml --train-dir my_train_results/MOT17_TEST/ --model-suffix pth
i got the error:
Traceback (most recent call last): File "tools/train_net.py", line 132, in
main()
File "tools/train_net.py", line 128, in main
train(cfg, train_dir, args.local_rank, args.distributed, logger)
File "tools/train_net.py", line 80, in train
logger, tensorboard_writer
File "./siammot/engine/trainer.py", line 51, in do_train
result, loss_dict = model(images, targets)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, kwargs)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], *kwargs[0])
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(input, kwargs)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/_initialize.py", line 197, in new_fwd
applier(kwargs, input_caster))
File "./siammot/modelling/rcnn.py", line 47, in forward
features = self.backbone(images.tensors)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, *kwargs)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(input, kwargs)
File "./siammot/modelling/backbone/dla.py", line 297, in forward
x5 = self.level5(x4)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, *kwargs)
File "./siammot/modelling/backbone/dla.py", line 231, in forward
x1 = self.tree1(x, residual)
File "/home/sx/Documents/anaconda/anaconda3/envs/pt170/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(input, **kwargs)
File "./siammot/modelling/backbone/dla.py", line 54, in forward
out += residual
RuntimeError: The size of tensor a (47) must match the size of tensor b (46) at non-singleton dimension 3
can anybody help me ? thank you !