ayanan1020-personal commented 11 months ago

I am trying to fine-tune the Co-Deformable-DETR model with the Swin-B backbone that was trained for 36 epochs on a custom dataset. I have registered the dataset successfully and can successfully run tests against it. However, when I try to train for more epochs, the script errors out with a missing key from the downloaded .pth file.

Traceback (most recent call last): File "tools/train.py", line 245, in <module> main() File "tools/train.py", line 241, in main meta=meta) File "/scratch/xxxx/Experiments/Localization/VinDR_CXR_Co-Detr/Co-DETR/mmdet/apis/train.py", line 242, in train_detector runner.resume(cfg.resume_from) File "/home/xxxxx/.conda/envs/co-detr/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 363, in resume self._epoch = checkpoint['meta']['epoch'] KeyError: 'meta'

Is it possible to fine-tune from this checkpoint file? Am I missing something or is the checkpoint file missing the required meta portion of the dict?

TempleX98 commented 11 months ago

Can you show me your training config?

ayanan1020-personal commented 11 months ago

I'm not sure which file you are referring to specifically, so I'll give a few. My goal is to finetune the model for bbox localization on the VinDR-CXR dataset. I formatted the dataset into the coco format and stored it in data/coco as though it were the coco dataset. I registered a custom dataset in mmdet that is a copy of the coco dataset file but with the number of classes overwritten.

The training command I am running is: python -m torch.distributed.launch --nproc_per_node=1 --master_port=12345 \ tools/train.py projects/configs/co_deformable_detr/co_deformable_detr_swin_base_3x_vindr_cxr_coco.py --launcher pytorch --work-dir anthony_experiments/Trial01 --resume-from checkpoints/co_deformable_detr_swin_base_3x_vindr_cxr_coco.pth

The co_deformable_detr_swin_base_3x_vindr_cxr_coco.py: base = [ 'co_deformable_detr_r50_1x_vindr_cxr_coco.py' ] pretrained = 'models/swin_base_patch4_window12_384_22k.pth'

model settings

model = dict( backbone=dict( delete=True, type='SwinTransformerV1', embed_dim=128, depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32], out_indices=(1, 2, 3), window_size=12, ape=False, drop_path_rate=0.4, patch_norm=True, use_checkpoint=False, pretrained=pretrained), neck=dict(in_channels=[1282, 1284, 128*8]))

optimizer

optimizer = dict(weight_decay=0.05) lr_config = dict(policy='step', step=[30]) runner = dict(type='EpochBasedRunner', max_epochs=36)

The co_deformable_detr_r50_1x_vindr_cxr_coco.py: base = [ '../base/datasets/vindr_cxr_coco_detection.py', '../base/default_runtime.py' ]

model settings

num_dec_layer = 6 lambda_2 = 2.0

model = dict( type='CoDETR', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='ChannelMapper', in_channels=[512, 1024, 2048], kernel_size=1, out_channels=256, act_cfg=None, norm_cfg=dict(type='GN', num_groups=32), num_outs=4), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0num_dec_layerlambda_2), loss_bbox=dict(type='L1Loss', loss_weight=1.0num_dec_layerlambda_2)), query_head=dict( type='CoDeformDETRHead', num_query=300, num_classes=15, in_channels=2048, sync_cls_avg_factor=True, with_box_refine=True, as_two_stage=True, mixed_selection=True, transformer=dict( type='CoDeformableDetrTransformer', num_co_heads=2, encoder=dict( type='DetrTransformerEncoder', num_layers=6, transformerlayers=dict( type='BaseTransformerLayer', attn_cfgs=dict( type='MultiScaleDeformableAttention', embed_dims=256, dropout=0.0), feedforward_channels=2048, ffn_dropout=0.0, operation_order=('self_attn', 'norm', 'ffn', 'norm'))), decoder=dict( type='CoDeformableDetrTransformerDecoder', num_layers=num_dec_layer, return_intermediate=True, look_forward_twice=True, transformerlayers=dict( type='DetrTransformerDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=256, num_heads=8, dropout=0.0), dict( type='MultiScaleDeformableAttention', embed_dims=256, dropout=0.0) ], feedforward_channels=2048, ffn_dropout=0.0, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')))), positional_encoding=dict( type='SinePositionalEncoding', num_feats=128, normalize=True, offset=-0.5), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=2.0), loss_bbox=dict(type='L1Loss', loss_weight=5.0), loss_iou=dict(type='GIoULoss', loss_weight=2.0)), roi_head=[dict( type='CoStandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[8, 16, 32, 64], finest_scale=112), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=15, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, reg_decoded_bbox=True, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0num_dec_layerlambda_2), loss_bbox=dict(type='GIoULoss', loss_weight=10.0num_dec_layerlambda_2)))], bbox_head=[dict( type='CoATSSHead', num_classes=15, in_channels=256, stacked_convs=1, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[.0, .0, .0, .0], target_stds=[0.1, 0.1, 0.2, 0.2]), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0num_dec_layerlambda_2), loss_bbox=dict(type='GIoULoss', loss_weight=2.0num_dec_layerlambda_2), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0num_dec_layerlambda_2)),],

model training and testing settings

train_cfg=[
    dict(
        assigner=dict(
            type='HungarianAssigner',
            cls_cost=dict(type='FocalLossCost', weight=2.0),
            reg_cost=dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'),
            iou_cost=dict(type='IoUCost', iou_mode='giou', weight=2.0))),
    dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=4000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    dict(
        assigner=dict(type='ATSSAssigner', topk=9),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),],
test_cfg=[
    dict(max_per_img=100),
    dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.0,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100)),
    dict(
        nms_pre=1000,
        min_bbox_size=0,
        score_thr=0.0,
        nms=dict(type='nms', iou_threshold=0.6),
        max_per_img=100),
    # soft-nms is also supported for rcnn testing
    # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05)
])

img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

train_pipeline, NOTE the img_scale and the Pad's size_divisor is different

from the default setting in mmdet.

train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[ [ dict( type='Resize', img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], multiscale_mode='value', keep_ratio=True) ], [ dict( type='Resize',

The radio of all image in train dataset < 7

                # follow the original impl
                img_scale=[(400, 4200), (500, 4200), (600, 4200)],
                multiscale_mode='value',
                keep_ratio=True),
            dict(
                type='RandomCrop',
                crop_type='absolute_range',
                crop_size=(384, 600),
                allow_negative_crop=True),
            dict(
                type='Resize',
                img_scale=[(480, 1333), (512, 1333), (544, 1333),
                           (576, 1333), (608, 1333), (640, 1333),
                           (672, 1333), (704, 1333), (736, 1333),
                           (768, 1333), (800, 1333)],
                multiscale_mode='value',
                override=True,
                keep_ratio=True)
        ]
    ]),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=1),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])

]

test_pipeline, NOTE the Pad's size_divisor is different from the default

setting (size_divisor=32). While there is little effect on the performance

whether we use the default setting or use size_divisor=1.

test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=1), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]

data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict(filter_empty_gt=False, pipeline=train_pipeline), val=dict(pipeline=test_pipeline), test=dict(pipeline=test_pipeline))

optimizer

optimizer = dict( type='AdamW', lr=2e-4, weight_decay=1e-4, paramwise_cfg=dict( custom_keys={ 'backbone': dict(lr_mult=0.1), 'sampling_offsets': dict(lr_mult=0.1), 'reference_points': dict(lr_mult=0.1) })) optimizer_config = dict(grad_clip=dict(max_norm=0.1, norm_type=2))

learning policy

lr_config = dict(policy='step', step=[11]) runner = dict(type='EpochBasedRunner', max_epochs=12)

Sorry about the formatting, I can also upload the files directly if needed. Thank you for the quick response and please let me know if you need anything else.

TempleX98 commented 11 months ago

Do you use load_from or resume_from to load the pretrained model?

ayanan1020-personal commented 11 months ago

resume_from

TempleX98 commented 11 months ago

Please use load_from to load the weights

ayanan1020-personal commented 11 months ago

I will try that, thank you!

ayanan1020-personal commented 11 months ago

This solution seems to be working. I did have to add the load_from as a parameter to the tools/train.py file. It's a simple addition but without it, trying to use load_from with tools/train.py throws an error. Just by adding it as an available argument it appears to pass correctly through the config to mmdet's training scripts.

Thank you again for the timely and direct solution.

Sense-X / Co-DETR

Co-Deformable-DETR Swin-B 36 Epoch Fine Tune Checkpoint Meta #65

model settings

optimizer

model settings

model training and testing settings

train_pipeline, NOTE the img_scale and the Pad's size_divisor is different

from the default setting in mmdet.

The radio of all image in train dataset < 7

test_pipeline, NOTE the Pad's size_divisor is different from the default

setting (size_divisor=32). While there is little effect on the performance

whether we use the default setting or use size_divisor=1.

optimizer

learning policy