Sense-X / Co-DETR

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training
MIT License
963 stars 103 forks source link

训练自己的数据集出现错误 #64

Open LL-XSJ opened 10 months ago

LL-XSJ commented 10 months ago

self._epoch = checkpoint['meta']['epoch'] KeyError: 'meta'

LL-XSJ commented 10 months ago

Hello, I found that there are no attributes other than "state_dict" in the weight you provided, but when training your own dataset, you need to load the attributes of checkpoint ['meta ']. What should I do in this situation?

TempleX98 commented 10 months ago

Can you show me your training config?

TempleX98 commented 10 months ago

Please use load_from to load the weights

LL-XSJ commented 10 months ago

Hello author, there was an error in the tool/train.py file. He said that without a model [meta], it does not exist. The model does not have a meta attribute, and I have checked that the pre training weight model does not contain a meta attribute. It only has one state Dict attribute, why is this? How should I modify it? The code error occurred in the following document: Train Detector( Model, Datasets, Cfg, Distributed=distributed, Validate=(not args. no_validate), Timestamp=timestamp, Meta=meta)

LL-XSJ commented 10 months ago

Just like this image

TempleX98 commented 10 months ago

Please check the resume_from argument in your training config. The error is raised since you try to resume the training from a checkpoint without a meta key. If you want to train a new model with the pre-trained weights, just set resume_from=None and use load_from to load the weights.

LL-XSJ commented 10 months ago

Which file to use load_from loading the weight file

LL-XSJ commented 10 months ago

Hello author, what is the reason for the error in distributed training?

LL-XSJ commented 10 months ago

Just like this image

LL-XSJ commented 10 months ago

May I ask how to solve it

TempleX98 commented 10 months ago

Please show me your training config

LL-XSJ commented 10 months ago

base = [ '../base/datasets/coco_detection.py', '../base/default_runtime.py' ]

model settings

num_dec_layer = 6 lambda_2 = 2.0

在修改配置文件训练自己的数据集的时候,一共有3个地方需要修改类别数量

model = dict( type='CoDETR', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='ChannelMapper', in_channels=[512, 1024, 2048], kernel_size=1, out_channels=256, act_cfg=None, norm_cfg=dict(type='GN', num_groups=32), num_outs=4), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', octave_base_scale=4, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64, 128]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0num_dec_layerlambda_2), loss_bbox=dict(type='L1Loss', loss_weight=1.0num_dec_layerlambda_2)), query_head=dict( type='CoDeformDETRHead', num_query=300,

修改为自己数据集的类别个数,由80->1,80代表的是coco数据集的类别数量,而1是自己数据集的类别数量

    num_classes=1,
    in_channels=2048,
    sync_cls_avg_factor=True,
    with_box_refine=True,
    as_two_stage=True,
    mixed_selection=True,
    transformer=dict(
        type='CoDeformableDetrTransformer',
        num_co_heads=2,
        encoder=dict(
            type='DetrTransformerEncoder',
            num_layers=6,
            transformerlayers=dict(
                type='BaseTransformerLayer',
                attn_cfgs=dict(
                    type='MultiScaleDeformableAttention', embed_dims=256, dropout=0.0),
                feedforward_channels=2048,
                ffn_dropout=0.0,
                operation_order=('self_attn', 'norm', 'ffn', 'norm'))),
        decoder=dict(
            type='CoDeformableDetrTransformerDecoder',
            num_layers=num_dec_layer,
            return_intermediate=True,
            look_forward_twice=True,
            transformerlayers=dict(
                type='DetrTransformerDecoderLayer',
                attn_cfgs=[
                    dict(
                        type='MultiheadAttention',
                        embed_dims=256,
                        num_heads=8,
                        dropout=0.0),
                    dict(
                        type='MultiScaleDeformableAttention',
                        embed_dims=256,
                        dropout=0.0)
                ],
                feedforward_channels=2048,
                ffn_dropout=0.0,
                operation_order=('self_attn', 'norm', 'cross_attn', 'norm',
                                 'ffn', 'norm')))),
    positional_encoding=dict(
        type='SinePositionalEncoding',
        num_feats=128,
        normalize=True,
        offset=-0.5),
    loss_cls=dict(
        type='FocalLoss',
        use_sigmoid=True,
        gamma=2.0,
        alpha=0.25,
        loss_weight=2.0),
    loss_bbox=dict(type='L1Loss', loss_weight=5.0),
    loss_iou=dict(type='GIoULoss', loss_weight=2.0)),
roi_head=[dict(
    type='CoStandardRoIHead',
    bbox_roi_extractor=dict(
        type='SingleRoIExtractor',
        roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
        out_channels=256,
        featmap_strides=[8, 16, 32, 64],
        finest_scale=112),
    bbox_head=dict(
        type='Shared2FCBBoxHead',
        in_channels=256,
        fc_out_channels=1024,
        roi_feat_size=7,
        # 修改为自己数据集的类别个数,由80->1,80代表的是coco数据集的类别数量,而1是自己数据集的类别数量
        num_classes=1,
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[0., 0., 0., 0.],
            target_stds=[0.1, 0.1, 0.2, 0.2]),
        reg_class_agnostic=False,
        reg_decoded_bbox=True,
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0*num_dec_layer*lambda_2),
        loss_bbox=dict(type='GIoULoss', loss_weight=10.0*num_dec_layer*lambda_2)))],
bbox_head=[dict(
    type='CoATSSHead',
    # 修改为自己数据集的类别个数,由80->1,80代表的是coco数据集的类别数量,而1是自己数据集的类别数量
    num_classes=1,
    in_channels=256,
    stacked_convs=1,
    feat_channels=256,
    anchor_generator=dict(
        type='AnchorGenerator',
        ratios=[1.0],
        octave_base_scale=8,
        scales_per_octave=1,
        strides=[8, 16, 32, 64, 128]),
    bbox_coder=dict(
        type='DeltaXYWHBBoxCoder',
        target_means=[.0, .0, .0, .0],
        target_stds=[0.1, 0.1, 0.2, 0.2]),
    loss_cls=dict(
        type='FocalLoss',
        use_sigmoid=True,
        gamma=2.0,
        alpha=0.25,
        loss_weight=1.0*num_dec_layer*lambda_2),
    loss_bbox=dict(type='GIoULoss', loss_weight=2.0*num_dec_layer*lambda_2),
    loss_centerness=dict(
        type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0*num_dec_layer*lambda_2)),],
# model training and testing settings
train_cfg=[
    dict(
        assigner=dict(
            type='HungarianAssigner',
            cls_cost=dict(type='FocalLossCost', weight=2.0),
            reg_cost=dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'),
            iou_cost=dict(type='IoUCost', iou_mode='giou', weight=2.0))),
    dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=-1,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=4000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    dict(
        assigner=dict(type='ATSSAssigner', topk=9),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),],
test_cfg=[
    dict(max_per_img=100),
    dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.0,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100)),
    dict(
        nms_pre=1000,
        min_bbox_size=0,
        score_thr=0.0,
        nms=dict(type='nms', iou_threshold=0.6),
        max_per_img=100),
    # soft-nms is also supported for rcnn testing
    # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05)
])

img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

train_pipeline, NOTE the img_scale and the Pad's size_divisor is different

from the default setting in mmdet.

train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='AutoAugment', policies=[ [ dict( type='Resize', img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), (608, 1333), (640, 1333), (672, 1333), (704, 1333), (736, 1333), (768, 1333), (800, 1333)], multiscale_mode='value', keep_ratio=True) ], [ dict( type='Resize',

The radio of all image in train dataset < 7

                # follow the original impl
                img_scale=[(400, 4200), (500, 4200), (600, 4200)],
                multiscale_mode='value',
                keep_ratio=True),
            dict(
                type='RandomCrop',
                crop_type='absolute_range',
                crop_size=(384, 600),
                allow_negative_crop=True),
            dict(
                type='Resize',
                img_scale=[(480, 1333), (512, 1333), (544, 1333),
                           (576, 1333), (608, 1333), (640, 1333),
                           (672, 1333), (704, 1333), (736, 1333),
                           (768, 1333), (800, 1333)],
                multiscale_mode='value',
                override=True,
                keep_ratio=True)
        ]
    ]),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=1),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])

]

test_pipeline, NOTE the Pad's size_divisor is different from the default

setting (size_divisor=32). While there is little effect on the performance

whether we use the default setting or use size_divisor=1.

test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg), dict(type='Pad', size_divisor=1), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]

data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict(filter_empty_gt=False, pipeline=train_pipeline), val=dict(pipeline=test_pipeline), test=dict(pipeline=test_pipeline))

optimizer

optimizer = dict( type='AdamW', lr=2e-4, weight_decay=1e-4, paramwise_cfg=dict( custom_keys={ 'backbone': dict(lr_mult=0.1), 'sampling_offsets': dict(lr_mult=0.1), 'reference_points': dict(lr_mult=0.1) })) optimizer_config = dict(grad_clip=dict(max_norm=0.1, norm_type=2))

learning policy

在训练epochs的设置上,1x太少了(1x=12epochs)

由12->36,与dino-detr的设置是一样的

对应的学习率衰减的epochs,由11->33

lr_config = dict(policy='step', step=[33]) runner = dict(type='EpochBasedRunner', max_epochs=36)

LL-XSJ commented 10 months ago

Hello author, is this the training config? I'm worried that I gave it the wrong way

TempleX98 commented 10 months ago

Have you modified the training script? Bad substitution means there are some errors in dist_train.sh and you can check it.

TempleX98 commented 10 months ago

Which file to use load_from loading the weight file

The image you post shows you are loading the weights from weights/co_deformable_detr-r50_1.pth. Please use load_from to load it.

LL-XSJ commented 10 months ago

Yes, I have modified the category and number of training rounds of the dataset. Will this affect the training of multiple nodes?