IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Apache License 2.0
2.19k stars 243 forks source link

Inquiry Regarding Model Performance Discrepancy in DINO-4scale Training #240

Open 1224wxwx opened 9 months ago

1224wxwx commented 9 months ago

Hello, and thank you for sharing your remarkable work.

I've executed the DINO-4scale training for 12 epochs using the DINO_train_dist.sh script. However, I've observed that my model's AP is 48.79, slightly lower than the reported 49.0 in your paper.

I would like to inquire whether this discrepancy is within the expected margin of error or if there might be an issue with my configurations.

The following are some configuration-related details from my training log. Could you please help me check if there are any issues. Your insights on this matter would be highly appreciated.

[12/12 10:42:50.432]: Command: main.py --local_rank=0 --output_dir logs/DINO/R50-MS4_ori -c config/DINO/DINO_4scale.py --coco_path /data/datasets/coco --options dn_scalar=100 embed_init_tgt=TRUE dn_label_coef=1.0 dn_bbox_coef=1.0 use_ema=False dn_box_noise_scale=1.0
[12/12 10:42:50.434]: Full config saved to logs/DINO/R50-MS4_ori/config_args_all.json
[12/12 10:42:50.434]: world size: 8
[12/12 10:42:50.434]: rank: 0
[12/12 10:42:50.435]: local_rank: 0
[12/12 10:42:50.435]: args: Namespace(add_channel_attention=False, add_pos_value=False, amp=False, aux_loss=True, backbone='resnet50', backbone_freeze_keywords=None, batch_norm_type='FrozenBatchNorm2d', batch_size=2, bbox_loss_coef=5.0, box_attn_type='roi_align', clip_max_norm=0.1, cls_loss_coef=1.0, coco_panoptic_path=None, coco_path='/data/datasets/coco', config_file='config/DINO/DINO_4scale.py', dabdetr_deformable_decoder=False, dabdetr_deformable_encoder=False, dabdetr_yolo_like_anchor_update=False, data_aug_max_size=1333, data_aug_scale_overlap=None, data_aug_scales=[480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800], data_aug_scales2_crop=[384, 600], data_aug_scales2_resize=[400, 500, 600], dataset_file='coco', ddetr_lr_param=False, debug=False, dec_layer_number=None, dec_layers=6, dec_n_points=4, dec_pred_bbox_embed_share=True, dec_pred_class_embed_share=True, decoder_layer_noise=False, decoder_module_seq=['sa', 'ca', 'ffn'], decoder_sa_type='sa', device='cuda', dice_loss_coef=1.0, dilation=False, dim_feedforward=2048, dist_backend='nccl', dist_url='env://', distributed=True, dln_hw_noise=0.2, dln_xy_noise=0.2, dn_bbox_coef=1.0, dn_box_noise_scale=1.0, dn_label_coef=1.0, dn_label_noise_ratio=0.5, dn_labelbook_size=91, dn_number=100, dn_scalar=100, dropout=0.0, ema_decay=0.9997, ema_epoch=0, embed_init_tgt=True, enc_layers=6, enc_loss_coef=1.0, enc_n_points=4, epochs=12, eval=False, find_unused_params=False, finetune_ignore=None, fix_refpoints_hw=-1, fix_size=False, focal_alpha=0.25, frozen_weights=None, giou_loss_coef=2.0, gpu=0, hidden_dim=256, interm_loss_coef=1.0, local_rank=0, lr=0.0001, lr_backbone=1e-05, lr_backbone_names=['backbone.0'], lr_drop=11, lr_drop_list=[33, 45], lr_linear_proj_mult=0.1, lr_linear_proj_names=['reference_points', 'sampling_offsets'], mask_loss_coef=1.0, masks=False, match_unstable_error=True, matcher_type='HungarianMatcher', modelname='dino', multi_step_lr=False, nheads=8, nms_iou_threshold=-1, no_interm_box_loss=False, note='', num_classes=91, num_feature_levels=4, num_patterns=0, num_queries=900, num_select=300, num_workers=10, onecyclelr=False, options={'dn_scalar': 100, 'embed_init_tgt': True, 'dn_label_coef': 1.0, 'dn_bbox_coef': 1.0, 'use_ema': False, 'dn_box_noise_scale': 1.0}, output_dir='logs/DINO/R50-MS4_ori', param_dict_type='default', pdetr3_bbox_embed_diff_each_layer=False, pdetr3_refHW=-1, pe_temperatureH=20, pe_temperatureW=20, position_embedding='sine', pre_norm=False, pretrain_model_path=None, query_dim=4, random_refpoints_xy=False, rank=0, remove_difficult=False, resume='', return_interm_indices=[1, 2, 3], save_checkpoint_interval=1, save_log=False, save_results=False, seed=42, set_cost_bbox=5.0, set_cost_class=2.0, set_cost_giou=2.0, start_epoch=0, test=False, transformer_activation='relu', two_stage_add_query_num=0, two_stage_bbox_embed_share=False, two_stage_class_embed_share=False, two_stage_default_hw=0.05, two_stage_keep_all_tokens=False, two_stage_learn_wh=False, two_stage_pat_embed=0, two_stage_type='standard', unic_layers=0, use_checkpoint=False, use_deformable_box_attn=False, use_detached_boxes_dec_out=False, use_dn=True, use_ema=False, weight_decay=0.0001, world_size=8)

Looking forward to your response.