JDAI-CV / LIO

Look-into-Object: Self-supervised Structure Modeling for Object Recognition (CVPR 2020)
Other
113 stars 24 forks source link

Can not get improvement training on Faster RCNN with SCL module #6

Closed TyroneLi closed 4 years ago

TyroneLi commented 4 years ago

I try to train Faster RCNN with scl module and use mmdetection default config faster_rcnn_r50_fpn_1x_coco.py.(Both with FPN) But I cannot get improvement, reimple-faster-rcnn coco-AP 37.3 vs reimple-faster-rcnn-with-scl coco-AP 37.4? Besides I initialized model from torchvision://resnet50, is that right? Or do I miss any vital details?? Does FPN make this happen?

dc3ea9f commented 4 years ago

I tested the fpn with X-101-32x4d-FPN and got improvement, so the FPN is not the point. The difference is the third step in segmentation/setup, we train this model from maskrcnn pretrained model rather than torchvision://resnet50. Training from torchvision://resnet50 hasn't been tested.

TyroneLi commented 4 years ago

I tested the fpn with X-101-32x4d-FPN and got improvement, so the FPN is not the point. The difference is the third step in segmentation/setup, we train this model from maskrcnn pretrained model rather than torchvision://resnet50. Training from torchvision://resnet50 hasn't been tested.

So you tested scl training from mask_rcnn pretrained model, how to adjust learning rate? Remian the same as mask_rcnn default config value? And I think the improvement came from longer traning steps but not scl module cause u initialized from maskrcnn pretrained model rather than Imagenet pretrained model? That is not true improvement? I will test faster rcnn with X-101-32x4d-FPN tommorrow.

dc3ea9f commented 4 years ago

The config of X-101-32x4d-FPN is segmentation/configs/scl/mask_rcnn_x101_32x4d_fpn_1x.py. Technically, train from ImageNet pretrained model can be better than finetune model. Actually, the maskrcnn model we used for comparisons has been fully converaged. Longer training step not alwasy means better performance. You can try to resume mask-rcnn training to check whether there is still performance improvement. What's more, if you want these two experiments to be fully aligned, you should train resnet50+scl on imagenet and use it on detection/segmentatioin training procedure.

TyroneLi commented 4 years ago

The config of X-101-32x4d-FPN is segmentation/configs/scl/mask_rcnn_x101_32x4d_fpn_1x.py. Technically, train from ImageNet pretrained model can be better than finetune model. Actually, the maskrcnn model we used for comparisons has been fully converaged. Longer training step not alwasy means better performance. You can try to resume mask-rcnn training to check whether there is still performance improvement. What's more, if you want these two experiments to be fully aligned, you should train resnet50+scl on imagenet and use it on detection/segmentatioin training procedure.

emmm,I may not agree with your idea. Cause maskrcnn pretrained model had already fully converged in coco dataset and training longer could help improve detection to some extend cause when using scheduler 2x, we could get better result. If u trained from imagenet-pretrained model or your scl-imagenet-pretrained mode(not released yet), then I would have no doubts. Besides, I trained faster rcnn using "R-50-C4 caffe 1x" pretrained model with lr 0.02, but I only got 30.7 AP in coco(I re-implemented using latest mmdetection but not too much different with your release version and I checked there're no problems, just a little modification.), did I miss some details in this part?

wyvernbai commented 4 years ago

Cause maskrcnn pretrained model had already fully converged in coco dataset and training longer could help improve detection to some extend cause when using scheduler 2x, we could get better result.

@TyroneLi Could you please provide the model configures and detail experimental results about this setting?

TyroneLi commented 4 years ago

Cause maskrcnn pretrained model had already fully converged in coco dataset and training longer could help improve detection to some extend cause when using scheduler 2x, we could get better result.

@TyroneLi Could you please provide the model configures and detail experimental results about this setting?

Here is my training config: `sys.platform: linux Python: 3.7.7 (default, Mar 26 2020, 15:48:22) [GCC 7.3.0] CUDA available: True CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.243 GPU 0,1,2,3,4,5,6,7: GeForce RTX 2080 Ti GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.4.0 PyTorch compiling details: PyTorch built with:

2020-07-16 19:15:48,861 - mmdet - INFO - Distributed training: True 2020-07-16 19:15:49,628 - mmdet - INFO - Config: norm_cfg = dict(type='BN', requires_grad=False) model = dict( type='FasterRCNN', pretrained=None, backbone=dict( type='ResNet', depth=50, num_stages=3, strides=(1, 2, 2), dilations=(1, 1, 1), out_indices=(2, ), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='caffe'), rpn_head=dict( type='RPNHead', in_channels=1024, feat_channels=1024, anchor_generator=dict( type='AnchorGenerator', scales=[2, 4, 8, 16, 32], ratios=[0.5, 1.0, 2.0], strides=[16]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='StandardRoIHead', shared_head=dict( type='ResLayer', depth=50, stage=3, stride=2, dilation=1, style='caffe', norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True), bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), out_channels=1024, featmap_strides=[16]), bbox_head=dict( type='BBoxHeadWithSCL', with_avg_pool=True, roi_feat_size=7, in_channels=2048, num_classes=80, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0), with_scl=True, structure_dim=128, loss_scl=dict(type='BareLoss', loss_weight=0.1)))) train_cfg = dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=12000, nms_post=2000, max_num=2000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)) test_cfg = dict( rpn=dict( nms_across_levels=False, nms_pre=6000, nms_post=1000, max_num=1000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) dataset_type = 'CocoDataset' data_root = '/data/lijinlong/datasets/COCO/2017/' img_norm_cfg = dict( mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file= '/data/lijinlong/datasets/COCO/2017/annotations/instances_train2017.json', img_prefix='/data/lijinlong/datasets/COCO/2017/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]), val=dict( type='CocoDataset', ann_file= '/data/lijinlong/datasets/COCO/2017/annotations/instances_val2017.json', img_prefix='/data/lijinlong/datasets/COCO/2017/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file= '/data/lijinlong/datasets/COCO/2017/annotations/instances_val2017.json', img_prefix='/data/lijinlong/datasets/COCO/2017/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(interval=1, metric='bbox') optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) total_epochs = 12 checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = './checkpoints/rpn_r50_caffe_c4_1x-ea7d3428.pth' resume_from = None workflow = [('train', 1)] work_dir = '/data/lijinlong/data/open_mmlab/mmdetection-20200715/work_dirs/scl_faster_rcnn_r50_caffe_c4_1x_coco' gpu_ids = range(0, 1)

2020-07-16 19:15:49,628 - mmdet - INFO - Set random seed to 0, deterministic: False 2020-07-16 19:16:14,519 - mmdet - INFO - load checkpoint from ./checkpoints/rpn_r50_caffe_c4_1x-ea7d3428.pth 2020-07-16 19:16:14,586 - mmdet - WARNING - The model and loaded state dict do not match exactly `

I trained faster_rcnn using mmdetection with faster_rcnn_r50_caffe_c4_1x_coco.py config, got 29.4 AP; trained with SCL Module got 30.5 AP. I haven't trained mask rcnn.

TyroneLi commented 4 years ago

Hello, anybody??

dc3ea9f commented 4 years ago

Sorry for the late reply. I was trapped by the final exams.

So your experiment implies that the SCL Module can help the detection task on faster-rcnn-r50 (29.4 -> 30.5).

I would try to train resnet50+scl on imagenet and use it on detection/segmentatioin training procedure later.