georghess / voxel-mae

Code for the paper "Masked Autoencoders for Self-Supervised Learning on Automotive Point Clouds"
Apache License 2.0
74 stars 6 forks source link

Have you tested masked SST with CenterHead? #9

Open synsin0 opened 1 year ago

synsin0 commented 1 year ago

Thanks for your great work. I replace the anchor3d head with centerhead, but the algorithm does not converge, have you tested masked SST with CenterHead?

georghess commented 1 year ago

Hi, We have not tried using the CenterHead detection head. Could you share your config here? I assume you've drawn some inspiration from https://github.com/open-mmlab/mmdetection3d/blob/47285b3f1e9dba358e98fcd12e523cfd0769c876/configs/_base_/models/centerpoint_02pillar_second_secfpn_nus.py#L32-L58 ? But I could have a look and see if I can find any obvious issues.

synsin0 commented 1 year ago

Here is my config of SST with CenterHead. The main problem is too large heatmap loss.

Used to try same settings as Zoeeeing as presented here: https://github.com/TuSimple/SST/issues/18

base = [ '../base/models/sst_base.py', '../base/datasets/nus-3d-1sweep.py', '../base/schedules/cosine_2x.py', '../base/default_runtime.py', ]

voxel_size = (0.25, 0.25, 8) window_shape = (16, 16, 1) # 12 0.32m point_cloud_range = [-50, -50, -5, 50, 50, 3] drop_info_training ={ 0:{'max_tokens':30, 'drop_range':(0, 30)}, 1:{'max_tokens':60, 'drop_range':(30, 60)}, 2:{'max_tokens':100, 'drop_range':(60, 100)}, 3:{'max_tokens':200, 'drop_range':(100, 200)}, 4:{'max_tokens':250, 'drop_range':(200, 100000)}, } drop_info_test ={ 0:{'max_tokens':30, 'drop_range':(0, 30)}, 1:{'max_tokens':60, 'drop_range':(30, 60)}, 2:{'max_tokens':100, 'drop_range':(60, 100)}, 3:{'max_tokens':200, 'drop_range':(100, 200)}, 4:{'max_tokens':256, 'drop_range':(200, 100000)}, # 1616=256 } drop_info = (drop_info_training, drop_info_test) shifts_list=[(0, 0), (window_shape[0]//2, window_shape[1]//2)]

model = dict( type='DynamicCenterPoint',

voxel_layer=dict(
    voxel_size=voxel_size,
    max_num_points=-1,
    point_cloud_range=point_cloud_range,
    max_voxels=(-1, -1)
),

voxel_encoder=dict(
    type='DynamicVFE',
    in_channels=4,
    feat_channels=[64, 128],
    with_distance=False,
    voxel_size=voxel_size,
    with_cluster_center=True,
    with_voxel_center=True,
    point_cloud_range=point_cloud_range,
    norm_cfg=dict(type='naiveSyncBN1d', eps=1e-3, momentum=0.01)
),

middle_encoder=dict(
    type='SSTInputLayerV2',
    window_shape=window_shape,
    sparse_shape=(400, 400, 1),
    shuffle_voxels=True,
    debug=True,
    drop_info=drop_info,
    pos_temperature=10000,
    normalize_pos=False,
    mute=True,
),

backbone=dict(
    type='SSTv2',
    d_model=[128,] * 6,
    nhead=[8, ] * 6,
    num_blocks=6,
    dim_feedforward=[256, ] * 6,
    output_shape=[400, 400],
    num_attached_conv=3,
    conv_kwargs=[
        dict(kernel_size=3, dilation=1, padding=1, stride=1),
        dict(kernel_size=3, dilation=1, padding=1, stride=1),
        dict(kernel_size=3, dilation=2, padding=2, stride=1),
    ],
    conv_in_channel=128,
    conv_out_channel=128,
    debug=True,
),

bbox_head=dict(
    type='CenterHead',
    _delete_=True,
    in_channels=384,
    tasks=[
        dict(num_class=1, class_names=['car']),
        dict(num_class=2, class_names=['truck', 'construction_vehicle']),
        dict(num_class=2, class_names=['bus', 'trailer']),
        dict(num_class=1, class_names=['barrier']),
        dict(num_class=2, class_names=['motorcycle', 'bicycle']),
        dict(num_class=2, class_names=['pedestrian', 'traffic_cone']),
    ],
    common_heads=dict(
        reg=(2, 2), height=(1, 2), dim=(3, 2), rot=(2, 2), vel=(2, 2)),
    share_conv_channel=64,
    bbox_coder=dict(
        type='CenterPointBBoxCoder',
        post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
        max_num=500,
        score_threshold=0.1,
        out_size_factor=1,
        voxel_size=voxel_size[:2],
        pc_range=point_cloud_range[:2],
        code_size=9),
    separate_head=dict(
        type='SeparateHead', init_bias=-2.19, final_kernel=3),
    loss_cls=dict(type='GaussianFocalLoss', reduction='mean'),
    loss_bbox=dict(type='L1Loss', reduction='mean', loss_weight=0.25),
    norm_bbox=True),
# model training and testing settings
train_cfg=dict(
        _delete_=True,
    # pts=dict(
        grid_size=[400, 400, 1],
        voxel_size=voxel_size,
        out_size_factor=1,
        dense_reg=1,
        gaussian_overlap=0.1,
        max_objs=500,
        min_radius=2,
        code_weights=[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.2, 0.2],
        point_cloud_range=point_cloud_range,
        # )
        ),
test_cfg=dict(
        _delete_=True,
    # pts=dict(
        post_center_limit_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0],
        max_per_img=500,
        max_pool_nms=False,
        min_radius=[4, 12, 10, 1, 0.85, 0.175],
        score_threshold=0.1,
        out_size_factor=1,
        voxel_size=voxel_size[:2],
        nms_type='rotate',
        pre_max_size=1000,
        post_max_size=83,
        nms_thr=0.2,
        point_cloud_range=point_cloud_range,

        # )
        )

)

runtime settings

runner = dict(type='EpochBasedRunner', max_epochs=24) evaluation = dict(interval=6) checkpoint_config = dict(interval=6)

fp16 = dict(loss_scale=32.0) data = dict( samples_per_gpu=4, workers_per_gpu=4, )

workflow = [('train', 1), ('val', 1)] # Includes validation at same frequency as training.