Open LeonBytes opened 6 months ago
These mismatched weights are the classifier branches. This is normal for the fine-tuning setting and does not affect the performance.
These mismatched weights are the classifier branches. This is normal for the fine-tuning setting and does not affect the performance.
Thank you for your answer. I want to train my own dataset, i.e. transfer learning. My dataset has only two categories, which are not in the 80 categories of pre-training. During the training process, it also prompts size mismatch. Is this also normal? (I have modified num_classes and the corresponding class name and in the co_deformable_detr_r50_1x_coco.py added load_from = './checkpoints/co_deformable_detr_r50_1x_coco.pth' resume_from = None)
I want to use the pre-trained co_deformable_detr_r50_1x_coco.pth to test on my dataset, but it shows "The model and loaded state dict do not match exactly" I have modified the def coco_classes(): to return my classification from ./mmdet\core\evaluation\class_names.py , modified the classes in CocoDataset(CustomDataset): from \mmdet\datasets\coco.py, and also the content related to the num_classes from projects\configs\co_deformable_detr\co_deformable_detr_r50_1x_coco.py
The content of co_deformable_detr_r50_1x_coco.py is below: base = [ '../base/datasets/coco_detection.py', '../base/default_runtime.py' ]
model settings
num_dec_layer = 6 lambda_2 = 2.0
model = dict( type='CoDETR', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='ChannelMapper', in_channels=[512, 1024, 2048], kernel_size=1, out_channels=256, act_cfg=None, norm_cfg=dict(type='GN', num_groups=32), num_outs=4), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0num_dec_layerlambda_2), loss_bbox=dict(type='L1Loss', loss_weight=1.0num_dec_layerlambda_2)), query_head=dict( type='CoDeformDETRHead', num_query=300, num_classes=2, in_channels=2048, sync_cls_avg_factor=True, with_box_refine=True, as_two_stage=True, mixed_selection=True, transformer=dict( type='CoDeformableDetrTransformer', num_co_heads=2, encoder=dict( type='DetrTransformerEncoder', num_layers=6, transformerlayers=dict( type='BaseTransformerLayer', attn_cfgs=dict( type='MultiScaleDeformableAttention', embed_dims=256, dropout=0.0), feedforward_channels=2048, ffn_dropout=0.0, operation_order=('self_attn', 'norm', 'ffn', 'norm'))), decoder=dict( type='CoDeformableDetrTransformerDecoder', num_layers=num_dec_layer, return_intermediate=True, look_forward_twice=True, transformerlayers=dict( type='DetrTransformerDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=256, num_heads=8, dropout=0.0), dict( type='MultiScaleDeformableAttention', embed_dims=256, dropout=0.0) ], feedforward_channels=2048, ffn_dropout=0.0, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')))), positional_encoding=dict( type='SinePositionalEncoding', num_feats=128, normalize=True, offset=-0.5), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=2.0), loss_bbox=dict(type='L1Loss', loss_weight=5.0), loss_iou=dict(type='GIoULoss', loss_weight=2.0)), roi_head=[dict( type='CoStandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[8, 16, 32, 64], finest_scale=112), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=2, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, reg_decoded_bbox=True, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0num_dec_layerlambda_2), loss_bbox=dict(type='GIoULoss', loss_weight=10.0num_dec_layerlambda_2)))], bbox_head=[dict( type='CoATSSHead', num_classes=2, in_channels=256, stacked_convs=1, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[.0, .0, .0, .0], target_stds=[0.1, 0.1, 0.2, 0.2]), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0num_dec_layerlambda_2), loss_bbox=dict(type='GIoULoss', loss_weight=2.0num_dec_layerlambda_2), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0num_dec_layerlambda_2)),],
model training and testing settings
img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline, NOTE the img_scale and the Pad's size_divisor is different
from the default setting in mmdet.
train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[ [ dict( type='Resize', img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], multiscale_mode='value', keep_ratio=True) ], [ dict( type='Resize',
The radio of all image in train dataset < 7
]
test_pipeline, NOTE the Pad's size_divisor is different from the default
setting (size_divisor=32). While there is little effect on the performance
whether we use the default setting or use size_divisor=1.
test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=1), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]
data = dict( samples_per_gpu=1, workers_per_gpu=1, train=dict(filter_empty_gt=False, pipeline=train_pipeline), val=dict(pipeline=test_pipeline), test=dict(pipeline=test_pipeline))
optimizer
optimizer = dict( type='AdamW', lr=2e-4, weight_decay=1e-4, paramwise_cfg=dict( custom_keys={ 'backbone': dict(lr_mult=0.1), 'sampling_offsets': dict(lr_mult=0.1), 'reference_points': dict(lr_mult=0.1) })) optimizer_config = dict(grad_clip=dict(max_norm=0.1, norm_type=2))
learning policy
lr_config = dict(policy='step', step=[11]) runner = dict(type='EpochBasedRunner', max_epochs=12) Here is the part of the bug information: The model and loaded state dict do not match exactly
size mismatch for query_head.cls_branches.0.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]). size mismatch for query_head.cls_branches.0.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for query_head.cls_branches.1.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]). size mismatch for query_head.cls_branches.1.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for query_head.cls_branches.2.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]). size mismatch for query_head.cls_branches.2.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for query_head.cls_branches.3.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]). size mismatch for query_head.cls_branches.3.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for query_head.cls_branches.4.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]). size mismatch for query_head.cls_branches.4.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for query_head.cls_branches.5.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]). size mismatch for query_head.cls_branches.5.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for query_head.cls_branches.6.weight: copying a param with shape torch.Size([80, 256]) from checkpoint, the shape in current model is torch.Size([2, 256]). size mismatch for query_head.cls_branches.6.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for roi_head.0.bbox_head.fc_cls.weight: copying a param with shape torch.Size([81, 1024]) from checkpoint, the shape in current model is torch.Size([3, 1024]). size mismatch for roi_head.0.bbox_head.fc_cls.bias: copying a param with shape torch.Size([81]) from checkpoint, the shape in current model is torch.Size([3]). size mismatch for roi_head.0.bbox_head.fc_reg.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([8, 1024]). size mismatch for roi_head.0.bbox_head.fc_reg.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([8]). size mismatch for bbox_head.0.atss_cls.weight: copying a param with shape torch.Size([80, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 256, 3, 3]). size mismatch for bbox_head.0.atss_cls.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([2]).