Open Deep-learning999 opened 3 years ago
Tnanks for your interest.
First, you could replace the conv_module_type='dcn'
with conv_module_type='norm'
(note that this is not the DCN used in the backbone) in :
bbox_head=dict(
type='LSHead',
task='bbox',
num_vectors=4,
num_classes=80,
in_channels=256,
feat_channels=256,
point_feat_channels=256,
stacked_convs=3,
num_kernel_points=9,
gradient_mul=0.1,
point_strides=[8, 16, 32, 64, 128],
point_base_scale=4,
norm_cfg=norm_cfg,
conv_module_type='dcn', #norm or dcn, norm is faster
loss_cls=dict(type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25,
loss_weight=1.0),
loss_bbox_init=dict(type='CrossIOULoss', loss_weight=1.0),
loss_bbox_refine=dict(type='CrossIOULoss', loss_weight=2.0)
to speed up the train and inference speed, which could be inceased by 50%. Second, we will add the backbone DLA-34 and the neck BiFPN to further speed up the train and inference speed as well as keep a reasonable accuracy.
It is a novel method that pays more attention to real-time speed. Excuse me, compared with other methods, detection, instance segmentation, and human pose inference speed can achieve real-time, such as inferred video can be >50fps per second