Open leaf1170124460 opened 1 year ago
And when I try to evaluate lvis instance segmentation, I got an even worse result.
The config file (modified from projects/configs/co_dino/co_dino_5scale_lsj_swin_large_3x_lvis.py
) is:
_base_ = [
'co_dino_5scale_lsj_r50_1x_lvis.py'
]
# model settings
num_dec_layer = 6
lambda_2 = 2.0
pretrained = 'models/co_dino_5scale_lsj_swin_large_3x_lvis.pth'
# model settings
model = dict(
eval_module='two-stage',
backbone=dict(
_delete_=True,
type='SwinTransformerV1',
embed_dim=192,
depths=[2, 2, 18, 2],
num_heads=[6, 12, 24, 48],
out_indices=(0, 1, 2, 3),
window_size=12,
ape=False,
drop_path_rate=0.3,
patch_norm=True,
use_checkpoint=False,
pretrained=pretrained),
neck=dict(in_channels=[192, 192*2, 192*4, 192*8]),
query_head=dict(
transformer=dict(
encoder=dict(
# number of layers that use checkpoint
with_cp=6))),
roi_head=[dict(
type='CoStandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32, 64],
finest_scale=56),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=1203,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
reg_decoded_bbox=True,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0*num_dec_layer*lambda_2),
loss_bbox=dict(type='GIoULoss', loss_weight=10.0*num_dec_layer*lambda_2)),
mask_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
out_channels=256,
featmap_strides=[8, 16, 32, 64],
finest_scale=112),
mask_head=dict(
type='FCNMaskHead',
num_convs=4,
in_channels=256,
conv_out_channels=256,
num_classes=1203,
loss_mask=dict(
type='CrossEntropyLoss', use_mask=True, loss_weight=1.0*num_dec_layer*lambda_2)),
)],
test_cfg=[
dict(
max_per_img=300,
nms=dict(type='soft_nms', iou_threshold=0.8)
),
dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.0,
nms=dict(type='nms', iou_threshold=0.5),
mask_thr_binary=0.5,
max_per_img=100)),
dict(
nms_pre=1000,
min_bbox_size=0,
score_thr=0.0,
nms=dict(type='nms', iou_threshold=0.6),
max_per_img=100),
# soft-nms is also supported for rcnn testing
# e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05)
])
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
image_size = (1280, 1280)
load_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
dict(
type='Resize',
img_scale=image_size,
ratio_range=(0.1, 2.0),
multiscale_mode='range',
keep_ratio=True),
dict(
type='RandomCrop',
crop_type='absolute_range',
crop_size=image_size,
recompute_bbox=True,
allow_negative_crop=True),
dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Pad', size=image_size, pad_val=dict(img=(114, 114, 114))),
]
train_pipeline = [
dict(type='CopyPaste', max_num_pasted=100),
dict(type='Normalize', **img_norm_cfg),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=image_size,
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Pad', size=image_size, pad_val=dict(img=(114, 114, 114))),
dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
dataset_type = 'LVISV1Dataset'
data_root = 'data/lvis_v1/'
img_data_root = 'data/coco/'
data = dict(
samples_per_gpu=4,
workers_per_gpu=4,
train=dict(
type='MultiImageMixDataset',
dataset=dict(
type=dataset_type,
ann_file=data_root + 'annotations/lvis_v1_train.json',
img_prefix=img_data_root,
filter_empty_gt=False,
pipeline=load_pipeline),
pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
The command is:
bash tools/dist_test.sh projects/configs/co_dino/co_dino_5scale_lsj_swin_large_3x_lvis_mask.py checkpoint/co_dino_5scale_lsj_swin_large_3x_lvis.pth 2 --eval segm
The output is:
OrderedDict([('segm_AP', 0.0), ('segm_AP50', 0.0), ('segm_AP75', 0.0), ('segm_APs', 0.0), ('segm_APm', 0.0), ('segm_APl', 0.0), ('segm_APr', 0.0), ('segm_APc', 0.0), ('segm_APf', 0.0), ('segm_mAP_copypaste', 'AP:0.000 AP50:0.000 AP75:0.000 APs:0.000 APm:0.000 APl:0.000 APr:0.000 APc:0.000 APf:0.000')])
All the provided model weights do not incorporate the auxiliary mask branch.
Thanks for your reply. May I ask when the weights containing the auxiliary mask branch will be uploaded? If you don't plan to upload weights, can you upload the training config for LVIS? Thanks!
I can provide the weights but it may take a while to retrain the model.
Thank you for your response! I'll patiently wait for you to finish retraining. If you could also provide the configs used for retraining, it would be even better.
Are you referring to R50 or SwinL?
Swin-L. Thanks.
Hi, @TempleX98, @josh3255, @mhd-medfa and @Sense-X.
I want to follow up regarding the training of the model you previously mentioned was in progress. It's been some time since the last update, and I am wondering if the model training has been completed at this point.
If it has finished training, would you consider releasing the weights? I think many in the community would benefit from it. If it is still a work in progress, would it be possible to share the training configuration file?
Thank you for your time and effort in this project. Looking forward to your response.
Hi, @TempleX98, @josh3255, @mhd-medfa and @Sense-X.
Thanks for your work on Co-DETR.
I used the config and command below to evaluate the coco instance segmentation, but I can not reproduce the result reported in the paper. I'm not sure if this issue is specific to my setup or if others are facing the same problem. The config file is:
The command is:
The output is: