Duankaiwen / LSNet

Location-Sensitive Visual Recognition with Cross-IOU Loss
154 stars 28 forks source link

About inferring speed #1

Open Deep-learning999 opened 3 years ago

Deep-learning999 commented 3 years ago

It is a novel method that pays more attention to real-time speed. Excuse me, compared with other methods, detection, instance segmentation, and human pose inference speed can achieve real-time, such as inferred video can be >50fps per second

Duankaiwen commented 3 years ago

Tnanks for your interest. First, you could replace the conv_module_type='dcn' with conv_module_type='norm' (note that this is not the DCN used in the backbone) in :

bbox_head=dict(
        type='LSHead',
        task='bbox',
        num_vectors=4,
        num_classes=80,
        in_channels=256,
        feat_channels=256,
        point_feat_channels=256,
        stacked_convs=3,
        num_kernel_points=9,
        gradient_mul=0.1,
        point_strides=[8, 16, 32, 64, 128],
        point_base_scale=4,
        norm_cfg=norm_cfg,
        conv_module_type='dcn', #norm or dcn, norm is faster
        loss_cls=dict(type='FocalLoss',  use_sigmoid=True, gamma=2.0, alpha=0.25,
                      loss_weight=1.0),
        loss_bbox_init=dict(type='CrossIOULoss', loss_weight=1.0),
        loss_bbox_refine=dict(type='CrossIOULoss', loss_weight=2.0)

to speed up the train and inference speed, which could be inceased by 50%. Second, we will add the backbone DLA-34 and the neck BiFPN to further speed up the train and inference speed as well as keep a reasonable accuracy.