JialeCao001 / SipMask

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation (ECCV2020)
https://arxiv.org/pdf/2007.14772.pdf
MIT License
334 stars 54 forks source link

SipMask-VIS custom dataset #41

Open ollefager opened 3 years ago

ollefager commented 3 years ago

I'm trying to use SipMask-VIS on a custom dataset. Training seems to go fine, however when testing I get really poor results. I don't really understand why. I have trained SipMask-mmdetection with similar config on my custom data with great results and I load from this checkpoint when I train SipMask-VIS.

Training results

![Figure_1](https://user-images.githubusercontent.com/33979503/111085003-9412ce80-8515-11eb-87c3-d37a92f980f3.png)

SipMask-VIS config

model = dict( type='SipMask', pretrained='open-mmlab://resnet50_caffe', backbone=dict( with_cp=True, type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), style='caffe'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs=True, extra_convs_on_inputs=False, # use P5 num_outs=5, relu_before_extra_convs=True), bbox_head=dict( type='SipMaskHead', num_classes=2, in_channels=256, stacked_convs=3, feat_channels=256, strides=[8, 16, 32, 64, 128], loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='IoULoss', loss_weight=1.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), center_sampling=True, center_sample_radius=1.5)) # training and testing settings train_cfg = dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, debug=False) test_cfg = dict( nms_pre=200, min_bbox_size=0, score_thr=0.03, nms=dict(type='nms', iou_thr=0.5), max_per_img=10) # dataset settings dataset_type = 'SausagesDataset' data_root = 'data_ytvos/' #classes = ('sausage',) img_norm_cfg = dict( mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) data = dict( imgs_per_gpu=4, workers_per_gpu=1, train=dict( type=dataset_type, ann_file=data_root + 'annotations/train.json', img_prefix=data_root + 'train', img_scale=(640, 340), #classes=classes, img_norm_cfg=img_norm_cfg, size_divisor=32, flip_ratio=0.5, with_mask=True, with_crowd=True, with_label=True, with_track=True), val=dict( type=dataset_type, ann_file=data_root + 'annotations/val.json', img_prefix=data_root + 'val', img_scale=(640, 340), #classes=classes, img_norm_cfg=img_norm_cfg, size_divisor=32, flip_ratio=0, with_mask=True, with_crowd=True, with_label=True), test=dict( type=dataset_type, ann_file=data_root + 'annotations/test.json', img_prefix=data_root + 'test', img_scale=(640, 340), #classes=classes, img_norm_cfg=img_norm_cfg, size_divisor=32, flip_ratio=0, with_mask=False, with_label=False, test_mode=True)) # optimizer optimizer = dict( type='SGD', lr=0.005, momentum=0.9, weight_decay=0.0001, paramwise_options=dict(bias_lr_mult=2., bias_decay_mult=0.)) optimizer_config = dict(grad_clip=None) # learning policy lr_config = dict( policy='step', warmup='linear', warmup_iters=1000, warmup_ratio=1.0 / 80, step=[8, 11]) checkpoint_config = dict(interval=1) # yapf:disable log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'), # dict(type='TensorboardLoggerHook') ]) # yapf:enable # runtime settings total_epochs = 12 device_ids = range(1) dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = './work_dirs/sipmask_r50_fpn_1x' load_from = './../SipMask-mmdetection/work_dirs/sipmask_r50_caffe_fpn_gn_1x_sausages/latest.pth' resume_from = None workflow = [('train', 1)]

Test result

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

Any tip or help on what could be happening would be much appreciated!

JialeCao001 commented 3 years ago

@ollefager Do you mean that the SipMask-mmdetection has a good result and SipMask-VIS has a bad result?

ollefager commented 3 years ago

Yes that's what I meant. However my results are better now. Despite checking my dataset format multiple times before I had still missed something. So there was a problem in my data. Now the results look quite good when looking at the saved test images (which they didn't before) but the metrics stills just shows 0's and -1's though.

bit-scientist commented 2 years ago

hi, @ollefager, how did you annotate your custom dataset? Could you suggest an annotation tool that matches with this repo? Also, could you provide some more information on how you performed your annotation, training and hopefully, testing too. Thank you!