Open buloseshi opened 2 years ago
Hey, I have encountered a similar issue, when trying to fine-tune the network with my own dataset. The problem is in the checkpoint, which after loading, produces this error. Did you find a solution for that?
Ye, When training your own dataset, Do not use the coco pre-trained model provided by the authors. Cannot be aligned due to different number of categories. Once you have trained your own model, you can use your own model as a pre-trained model
hello, to train from a pre-trained model, I have commented out the following section of code:
"""
if not args.eval and 'optimizer' in checkpoint and 'lr_scheduler' in checkpoint and 'epoch' in checkpoint:
import copy
p_groups = copy.deepcopy(optimizer.param_groups)
optimizer.load_state_dict(checkpoint['optimizer'])
for pg, pg_old in zip(optimizer.param_groups, p_groups):
pg['lr'] = pg_old['lr']
pg['initial_lr'] = pg_old['initial_lr']
print(optimizer.param_groups)
lr_scheduler.load_state_dict(checkpoint['lr_scheduler'])
args.override_resumed_lr_drop = True
if args.override_resumed_lr_drop:
print('Warning: (hack) args.override_resumed_lr_drop is set to True, so args.lr_drop would override lr_drop in resumed lr_scheduler.')
lr_scheduler.step_size = args.lr_drop
lr_scheduler.base_lrs = list(map(lambda group: group['initial_lr'], optimizer.param_groups))
lr_scheduler.step(lr_scheduler.last_epoch)
args.start_epoch = checkpoint['epoch'] + 1
"""
I have added the following code block:
del checkpoint["model"]["class_embed.0.weight"] del checkpoint["model"]["class_embed.0.bias"] del checkpoint["model"]["class_embed.1.weight"] del checkpoint["model"]["class_embed.1.bias"] del checkpoint["model"]["class_embed.2.weight"] del checkpoint["model"]["class_embed.2.bias"] del checkpoint["model"]["class_embed.3.weight"] del checkpoint["model"]["class_embed.3.bias"] del checkpoint["model"]["class_embed.4.weight"] del checkpoint["model"]["class_embed.4.bias"] del checkpoint["model"]["class_embed.5.weight"] del checkpoint["model"]["class_embed.5.bias"]
missing_keys, unexpected_keys = model_without_ddp.load_state_dict(checkpoint['model'], strict=False)
and also defined the number of classes by replacing the following line:
by
num_classes = 1 or another number
I don't know if this is the right approach, I have tested this with the balloon dataset and so far the training works and manages to make predictions. However I am not sure if this should be the case.
I have change the num_class to my own dataset, which is 2. in this code line :num_classes = 20 if args.dataset_file != 'coco' else 91 and change the checkpoint["model"]["class_embed.0.weight"] size to 3*256 checkpoint["model"]["class_embed.0.bias"] size to 3 (from layer 0 to 5) but i still met the problem: RuntimeError: The size of tensor a (91) must match the size of tensor b (3) at non-singleton dimension 0
I was able to run it successfully, but I'm not sure if this approach is appropriate. you can exclude loading the optimizer checkpoint by commenting out or modifying this section of code in the main.py file.
hello! i met some similar problem. i exported the model into torchscript, and i try to inference on the exported model, but the model only can inference on the image that i used for exporting the model, but for other image,it cann't work,and shows the error message:
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return forward_call(*input, *kwargs)
Traceback (most recent call last):
File "/root/autodl-tmp/project/deploy/export_model.py", line 264, in
return (_0, _1, _2, _3, _4, _5, _6)
File "code/__torch__/adet/modeling/text_spotter.py", line 23, in forward
batched_imgs = torch.unsqueeze_(_7, 0)
x0 = torch.contiguous(batched_imgs)
_8, _9, _10, _11, = (_0).forward(x0, image_size, )
~~~~~~~~~~~ <--- HERE
_12 = torch.softmax(_9, -1)
prob = torch.sigmoid(torch.mean(_8, [-2]))
File "code/__torch__/adet/modeling/model/detection_transformer.py", line 50, in forward
_29 = getattr(self.input_proj, "1")
_30 = getattr(self.input_proj, "0")
_31 = (self.backbone).forward(x, image_size, )
~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_32, _33, _34, _35, _36, _37, _38, _39, _40, _41, _42, _43, _44, _45, _46, _47, _48, _49, _50, _51, _52, _53, _54, _55, _56, _57, = _31
_58 = (_30).forward(_32, )
File "code/__torch__/adet/modeling/text_spotter.py", line 104, in forward
image_size: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
_61 = getattr(self, "1")
_62 = (getattr(self, "0")).forward(x, image_size, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_63, _64, _65, _66, _67, _68, _69, = _62
pos_embed = torch.to((_61).forward(_63, ), 6)
File "code/__torch__/adet/modeling/text_spotter.py", line 143, in forward
_92 = torch.slice(torch.slice(_91, 0, 0, 125), 1, 0, 138)
_93 = torch.view(CONSTANTS.c2, annotate(List[int], []))
_94 = torch.copy_(_92, torch.expand(_93, [125, 138]))
~~~~~~~~~~~ <--- HERE
masks_per_feature_level0 = torch.ones([_85, _86, _87], dtype=11, layout=None, device=torch.device("cpu"), pin_memory=False)
_95 = torch.select(masks_per_feature_level0, 0, 0)
Traceback of TorchScript, original code (most recent call last):
/root/autodl-tmp/project/adet/modeling/text_spotter.py(60): mask_out_padding
/root/autodl-tmp/project/adet/modeling/text_spotter.py(43): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/text_spotter.py(21): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/model/detection_transformer.py(168): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/text_spotter.py(220): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(259): <lambda>
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(294): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(952): trace_module
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(735): trace
/root/autodl-tmp/project/deploy/export_model.py(125): export_tracing
/root/autodl-tmp/project/deploy/export_model.py(224): <module>
/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py(18): execfile
/root/.pycharm_helpers/pydev/pydevd.py(1496): _exec
/root/.pycharm_helpers/pydev/pydevd.py(1489): run
/root/.pycharm_helpers/pydev/pydevd.py(2177): main
/root/.pycharm_helpers/pydev/pydevd.py(2195): <module>
RuntimeError: The size of tensor a (50) must match the size of tensor b (125) at non-singleton dimension 0
I have change the num_class to my own dataset, which is 2. in this code line :num_classes = 20 if args.dataset_file != 'coco' else 91 and change the checkpoint["model"]["class_embed.0.weight"] size to 3*256 checkpoint["model"]["class_embed.0.bias"] size to 3 (from layer 0 to 5) but i still met the problem: RuntimeError: The size of tensor a (91) must match the size of tensor b (3) at non-singleton dimension 0
the "pth" you download include "model""lr""loss". you should delete "lr""loss" and leave “model”
RuntimeError: The size of tensor a (91) must match the size of tensor b (7) at non-singleton dimension 0 Traceback (most recent call last): File "main.py", line 326, in
main(args)
File "main.py", line 276, in main
model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm)
File "/mnt/e/opencv/Deformable-DETR-main/engine.py", line 68, in train_one_epoch
optimizer.step()
File "/home/bugs/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 67, in wrapper
return wrapped(*args, *kwargs)
File "/home/bugs/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(args, **kwargs)
File "/home/bugs/anaconda3/envs/deformable_detr/lib/python3.7/site-packages/torch/optim/adamw.py", line 104, in step
expavg.mul(beta1).add_(grad, alpha=1 - beta1)
RuntimeError: The size of tensor a (91) must match the size of tensor b (7) at non-singleton dimension 0