buloseshi commented 2 years ago

RuntimeError: The size of tensor a (91) must match the size of tensor b (7) at non-singleton dimension 0 Traceback (most recent call last): File "main.py", line 326, in main(args) File "main.py", line 276, in main model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm) File "/mnt/e/opencv/Deformable-DETR-main/engine.py", line 68, in train_one_epoch optimizer.step() File "/home/bugs/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 67, in wrapper return wrapped(*args, *kwargs) File "/home/bugs/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(args, **kwargs) File "/home/bugs/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/optim/adamw.py", line 104, in step expavg.mul(beta1).add_(grad, alpha=1 - beta1) RuntimeError: The size of tensor a (91) must match the size of tensor b (7) at non-singleton dimension 0

vvvityaaa commented 2 years ago

Hey, I have encountered a similar issue, when trying to fine-tune the network with my own dataset. The problem is in the checkpoint, which after loading, produces this error. Did you find a solution for that?

buloseshi commented 2 years ago

Ye, When training your own dataset， Do not use the coco pre-trained model provided by the authors. Cannot be aligned due to different number of categories. Once you have trained your own model, you can use your own model as a pre-trained model

jeiva2000 commented 2 years ago

hello, to train from a pre-trained model, I have commented out the following section of code:

"""
if not args.eval and 'optimizer' in checkpoint and 'lr_scheduler' in checkpoint and 'epoch' in checkpoint: import copy p_groups = copy.deepcopy(optimizer.param_groups) optimizer.load_state_dict(checkpoint['optimizer']) for pg, pg_old in zip(optimizer.param_groups, p_groups): pg['lr'] = pg_old['lr'] pg['initial_lr'] = pg_old['initial_lr'] print(optimizer.param_groups) lr_scheduler.load_state_dict(checkpoint['lr_scheduler'])

todo: this is a hack for doing experiment that resume from checkpoint and also modify lr scheduler (e.g., decrease lr in advance).

        args.override_resumed_lr_drop = True
        if args.override_resumed_lr_drop:
            print('Warning: (hack) args.override_resumed_lr_drop is set to True, so args.lr_drop would override lr_drop in resumed lr_scheduler.')
            lr_scheduler.step_size = args.lr_drop
            lr_scheduler.base_lrs = list(map(lambda group: group['initial_lr'], optimizer.param_groups))
        lr_scheduler.step(lr_scheduler.last_epoch)
        args.start_epoch = checkpoint['epoch'] + 1
    """

I have added the following code block:

del checkpoint["model"]["class_embed.0.weight"] del checkpoint["model"]["class_embed.0.bias"] del checkpoint["model"]["class_embed.1.weight"] del checkpoint["model"]["class_embed.1.bias"] del checkpoint["model"]["class_embed.2.weight"] del checkpoint["model"]["class_embed.2.bias"] del checkpoint["model"]["class_embed.3.weight"] del checkpoint["model"]["class_embed.3.bias"] del checkpoint["model"]["class_embed.4.weight"] del checkpoint["model"]["class_embed.4.bias"] del checkpoint["model"]["class_embed.5.weight"] del checkpoint["model"]["class_embed.5.bias"]

missing_keys, unexpected_keys = model_without_ddp.load_state_dict(checkpoint['model'], strict=False)

and also defined the number of classes by replacing the following line:

num_classes = 20 if args.dataset_file != 'coco' else 91

by

num_classes = 1 or another number

I don't know if this is the right approach, I have tested this with the balloon dataset and so far the training works and manages to make predictions. However I am not sure if this should be the case.

miaowumonsyer commented 1 year ago

I have change the num_class to my own dataset, which is 2. in this code line :num_classes = 20 if args.dataset_file != 'coco' else 91 and change the checkpoint["model"]["class_embed.0.weight"] size to 3*256 checkpoint["model"]["class_embed.0.bias"] size to 3 (from layer 0 to 5) but i still met the problem: RuntimeError: The size of tensor a (91) must match the size of tensor b (3) at non-singleton dimension 0

capsule2077 commented 1 year ago

I was able to run it successfully, but I'm not sure if this approach is appropriate. you can exclude loading the optimizer checkpoint by commenting out or modifying this section of code in the main.py file.

Zalways commented 1 year ago

hello! i met some similar problem. i exported the model into torchscript, and i try to inference on the exported model, but the model only can inference on the image that i used for exporting the model, but for other image,it cann't work,and shows the error message: /root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return forward_call(*input, *kwargs) Traceback (most recent call last): File "/root/autodl-tmp/project/deploy/export_model.py", line 264, in out1 = m(data) File "/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/detectron2/export/flatten.py", line 9, in forward def forward(self: torch.detectron2.export.flatten.TracingAdapter, argument_1: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]: _0, _1, _2, _3, _4, _5, _6, = (self.model).forward(argument_1, )


    return (_0, _1, _2, _3, _4, _5, _6)
  File "code/__torch__/adet/modeling/text_spotter.py", line 23, in forward
    batched_imgs = torch.unsqueeze_(_7, 0)
    x0 = torch.contiguous(batched_imgs)
    _8, _9, _10, _11, = (_0).forward(x0, image_size, )
                         ~~~~~~~~~~~ <--- HERE
    _12 = torch.softmax(_9, -1)
    prob = torch.sigmoid(torch.mean(_8, [-2]))
  File "code/__torch__/adet/modeling/model/detection_transformer.py", line 50, in forward
    _29 = getattr(self.input_proj, "1")
    _30 = getattr(self.input_proj, "0")
    _31 = (self.backbone).forward(x, image_size, )
           ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _32, _33, _34, _35, _36, _37, _38, _39, _40, _41, _42, _43, _44, _45, _46, _47, _48, _49, _50, _51, _52, _53, _54, _55, _56, _57, = _31
    _58 = (_30).forward(_32, )
  File "code/__torch__/adet/modeling/text_spotter.py", line 104, in forward
    image_size: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
    _61 = getattr(self, "1")
    _62 = (getattr(self, "0")).forward(x, image_size, )
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _63, _64, _65, _66, _67, _68, _69, = _62
    pos_embed = torch.to((_61).forward(_63, ), 6)
  File "code/__torch__/adet/modeling/text_spotter.py", line 143, in forward
    _92 = torch.slice(torch.slice(_91, 0, 0, 125), 1, 0, 138)
    _93 = torch.view(CONSTANTS.c2, annotate(List[int], []))
    _94 = torch.copy_(_92, torch.expand(_93, [125, 138]))
          ~~~~~~~~~~~ <--- HERE
    masks_per_feature_level0 = torch.ones([_85, _86, _87], dtype=11, layout=None, device=torch.device("cpu"), pin_memory=False)
    _95 = torch.select(masks_per_feature_level0, 0, 0)

Traceback of TorchScript, original code (most recent call last):
/root/autodl-tmp/project/adet/modeling/text_spotter.py(60): mask_out_padding
/root/autodl-tmp/project/adet/modeling/text_spotter.py(43): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/text_spotter.py(21): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/model/detection_transformer.py(168): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/text_spotter.py(220): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(259): <lambda>
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(294): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(952): trace_module
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(735): trace
/root/autodl-tmp/project/deploy/export_model.py(125): export_tracing
/root/autodl-tmp/project/deploy/export_model.py(224): <module>
/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py(18): execfile
/root/.pycharm_helpers/pydev/pydevd.py(1496): _exec
/root/.pycharm_helpers/pydev/pydevd.py(1489): run
/root/.pycharm_helpers/pydev/pydevd.py(2177): main
/root/.pycharm_helpers/pydev/pydevd.py(2195): <module>
RuntimeError: The size of tensor a (50) must match the size of tensor b (125) at non-singleton dimension 0

thetashall commented 1 year ago

I have change the num_class to my own dataset, which is 2. in this code line :num_classes = 20 if args.dataset_file != 'coco' else 91 and change the checkpoint["model"]["class_embed.0.weight"] size to 3*256 checkpoint["model"]["class_embed.0.bias"] size to 3 (from layer 0 to 5) but i still met the problem: RuntimeError: The size of tensor a (91) must match the size of tensor b (3) at non-singleton dimension 0

the "pth" you download include "model""lr""loss". you should delete "lr""loss" and leave “model”

fundamentalvision / Deformable-DETR

How to fix: RuntimeError: The size of tensor a (91) must match the size of tensor b (7) at non-singleton dimension 0 #145

todo: this is a hack for doing experiment that resume from checkpoint and also modify lr scheduler (e.g., decrease lr in advance).

num_classes = 20 if args.dataset_file != 'coco' else 91