Haiyang-W / DSVT

[CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"
https://arxiv.org/abs/2301.06051
Apache License 2.0
373 stars 28 forks source link

KITTI dataset #59

Closed thatnn closed 1 year ago

thatnn commented 1 year ago

First thank u for your amazing work

I want to train and test with kitti format data.

So i modify some parameter but it doesn't work

error is below


Traceback (most recent call last): File "train.py", line 228, in main() File "train.py", line 172, in main train_model( File "/home/user/DSVT/tools/train_utils/train_utils.py", line 224, in train_model accumulated_iter = train_one_epoch( File "/home/user/DSVT/tools/train_utils/train_utils.py", line 75, in train_one_epoch loss, tb_dict, disp_dict = model_func(model, batch) File "/home/user/DSVT/tools/../pcdet/models/init.py", line 42, in model_func ret_dict, tb_dict, disp_dict = model(batch_dict) File "/home/user/anaconda3/envs/openpcdet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/user/DSVT/tools/../pcdet/models/detectors/centerpoint.py", line 14, in forward loss, tb_dict, disp_dict = self.get_training_loss() File "/home/user/DSVT/tools/../pcdet/models/detectors/centerpoint.py", line 27, in get_training_loss loss_rpn, tb_dict = self.dense_head.get_loss() File "/home/user/DSVT/tools/../pcdet/models/dense_heads/center_head.py", line 258, in get_loss hm_loss = self.hm_loss_func(pred_dict['hm'], target_dicts['heatmaps'][idx]) File "/home/user/anaconda3/envs/openpcdet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, *kwargs) File "/home/user/DSVT/tools/../pcdet/utils/loss_utils.py", line 312, in forward return self.neg_loss(out, target, mask=mask) File "/home/user/DSVT/tools/../pcdet/utils/loss_utils.py", line 282, in neg_loss_cornernet pos_loss = torch.log(pred) torch.pow(1 - pred, 2) * pos_inds RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0


and here is my yaml file


CLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist']

DATA_CONFIG: 
    _BASE_CONFIG_: cfgs/dataset_configs/kitti_dataset.yaml
    POINT_CLOUD_RANGE: [0, -39.68, -3, 69.12, 39.68, 1]
    DATA_AUGMENTOR:
        DISABLE_AUG_LIST: ['placeholder']
        AUG_CONFIG_LIST:
            - NAME: gt_sampling
              USE_ROAD_PLANE: False
              DB_INFO_PATH:
                  - kitti_dbinfos_train.pkl
              PREPARE: {
                 filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
                 filter_by_difficulty: [-1],
              }

              SAMPLE_GROUPS: ['Car:15','Pedestrian:15', 'Cyclist:15']
              NUM_POINT_FEATURES: 4
              REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
              LIMIT_WHOLE_SCENE: True

            - NAME: random_world_flip
              ALONG_AXIS_LIST: ['x','y']

            - NAME: random_world_rotation
              WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]

            - NAME: random_world_scaling
              WORLD_SCALE_RANGE: [0.95, 1.05]
            - NAME: random_world_translation
              NOISE_TRANSLATE_STD: [0.5, 0.5, 0.5]
    DATA_PROCESSOR:
      -   NAME: mask_points_and_boxes_outside_range
          REMOVE_OUTSIDE_BOXES: True
      -   NAME: shuffle_points
          SHUFFLE_ENABLED: {
            'train': True,
            'test': False
          }
      -   NAME: transform_points_to_voxels_placeholder
          VOXEL_SIZE: [ 0.4, 0.4, 0.1875 ]
MODEL:
  NAME: CenterPoint

  VFE:
    NAME: DynPillarVFE3D
    WITH_DISTANCE: False
    USE_ABSLOTE_XYZ: True
    USE_NORM: True
    NUM_FILTERS: [192, 192]

  BACKBONE_3D:
    NAME: DSVT
    INPUT_LAYER:
      sparse_shape: [400, 300, 32]
      downsample_stride: [[1, 1, 4], [1, 1, 4], [1, 1, 2]]
      d_model: [192, 192, 192, 192]
      set_info: [[48, 1], [48, 1], [48, 1], [48, 1]]
      window_shape: [[12, 12, 32], [12, 12, 8], [12, 12, 2], [12, 12, 1]]
      hybrid_factor: [2, 2, 1] # x, y, z
      shifts_list: [[[0, 0, 0], [6, 6, 0]], [[0, 0, 0], [6, 6, 0]], [[0, 0, 0], [6, 6, 0]], [[0, 0, 0], [6, 6, 0]]]
      normalize_pos: False

    block_name: ['DSVTBlock','DSVTBlock','DSVTBlock','DSVTBlock']
    set_info: [[48, 1], [48, 1], [48, 1], [48, 1]]
    d_model: [192, 192, 192, 192]
    nhead: [8, 8, 8, 8]
    dim_feedforward: [384, 384, 384, 384]
    dropout: 0.0 
    activation: gelu
    reduction_type: 'attention'
    output_shape: [468, 468]
    conv_out_channel: 192
    # ues_checkpoint: True

  MAP_TO_BEV:
    NAME: PointPillarScatter3d
    INPUT_SHAPE: [468, 468, 1]
    NUM_BEV_FEATURES: 192

  BACKBONE_2D:
    NAME: BaseBEVResBackbone
    LAYER_NUMS: [ 1, 2, 2 ]
    LAYER_STRIDES: [ 1, 2, 2 ]
    NUM_FILTERS: [ 128, 128, 256 ]
    UPSAMPLE_STRIDES: [ 1, 2, 4 ]
    NUM_UPSAMPLE_FILTERS: [ 128, 128, 128 ]

  DENSE_HEAD:
    NAME: CenterHead
    CLASS_AGNOSTIC: False

    CLASS_NAMES_EACH_HEAD: [
      ['Car', 'Pedestrian', 'Cyclist']
    ]

    SHARED_CONV_CHANNEL: 64
    USE_BIAS_BEFORE_NORM: False
    NUM_HM_CONV: 2

    BN_EPS: 0.001
    BN_MOM: 0.01
    SEPARATE_HEAD_CFG:
      HEAD_ORDER: ['center', 'center_z', 'dim', 'rot']
      HEAD_DICT: {
        'center': {'out_channels': 2, 'num_conv': 2},
        'center_z': {'out_channels': 1, 'num_conv': 2},
        'dim': {'out_channels': 3, 'num_conv': 2},
        'rot': {'out_channels': 2, 'num_conv': 2},
        'iou': {'out_channels': 1, 'num_conv': 2},
      }

    TARGET_ASSIGNER_CONFIG:
      FEATURE_MAP_STRIDE: 1
      NUM_MAX_OBJS: 500
      GAUSSIAN_OVERLAP: 0.1
      MIN_RADIUS: 2

    IOU_REG_LOSS: True

    LOSS_CONFIG:
      LOSS_WEIGHTS: {
        'cls_weight': 1.0,
        'loc_weight': 2.0,
        'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
      }

    POST_PROCESSING:
      SCORE_THRESH: 0.5
      POST_CENTER_LIMIT_RANGE: [-80, -80, -10.0, 80, 80, 10.0]
      MAX_OBJ_PER_SAMPLE: 500

      USE_IOU_TO_RECTIFY_SCORE: True
      IOU_RECTIFIER: [0.68, 0.71, 0.65]

      NMS_CONFIG:
        NMS_TYPE: multi_class_nms  # only for centerhead, use mmdet3d version nms
        NMS_THRESH: [0.7, 0.6, 0.55]
        NMS_PRE_MAXSIZE: [4096, 4096, 4096]
        NMS_POST_MAXSIZE: [500, 500, 500]

  POST_PROCESSING:
    RECALL_THRESH_LIST: [0.3, 0.5, 0.7]

    EVAL_METRIC: kitti

OPTIMIZATION:
    BATCH_SIZE_PER_GPU: 4
    NUM_EPOCHS: 30

    OPTIMIZER: adam_onecycle
    LR: 0.003
    WEIGHT_DECAY: 0.01
    MOMENTUM: 0.9

    MOMS: [0.95, 0.85]
    PCT_START: 0.4
    DIV_FACTOR: 10
    DECAY_STEP_LIST: [35, 45]
    LR_DECAY: 0.1
    LR_CLIP: 0.0000001

    LR_WARMUP: False
    WARMUP_EPOCH: 1

    GRAD_NORM_CLIP: 10
    LOSS_SCALE_FP16: 32.0

HOOK:
  DisableAugmentationHook:
    DISABLE_AUG_LIST: ['gt_sampling','random_world_flip','random_world_rotation','random_world_scaling', 'random_world_translation']
    NUM_LAST_EPOCHS: 1

Can u help me or provide some yaml file to train kitti dataset?

I'm waiting for yor reply

Thank you!!


chenshi3 commented 1 year ago

According your config, you use the wrong sparse_shape: [400, 300, 32] and INPUT_SHAPE: [468, 468, 1].

thatnn commented 1 year ago

According your config, you use the wrong sparse_shape: [400, 300, 32] and INPUT_SHAPE: [468, 468, 1].

Thank you for your reply

What are the values fit on Sparse_shape and Input_shape ?

is it realated to point cloud range?

Thank you

chenshi3 commented 1 year ago

According your config, you use the wrong sparse_shape: [400, 300, 32] and INPUT_SHAPE: [468, 468, 1].

Thank you for your reply

What are the values fit on Sparse_shape and Input_shape ?

is it realated to point cloud range?

Thank you

POINT_CLOUD_RANGE, VOXEL_SIZE and downsample_stride.

thatnn commented 1 year ago

According your config, you use the wrong sparse_shape: [400, 300, 32] and INPUT_SHAPE: [468, 468, 1].

Thank you for your reply What are the values fit on Sparse_shape and Input_shape ? is it realated to point cloud range? Thank you

POINT_CLOUD_RANGE, VOXEL_SIZE and downsample_stride.

Can u suggest some values?

Thanks

chenshi3 commented 1 year ago

The downsample_stride should be carefully considered and is relative to INPUT_SHAPE in MAP_TO_BEV. I recommend you read the origin code.

chenshi3 commented 1 year ago

According your config, you use the wrong sparse_shape: [400, 300, 32] and INPUT_SHAPE: [468, 468, 1].

Thank you for your reply What are the values fit on Sparse_shape and Input_shape ? is it realated to point cloud range? Thank you

POINT_CLOUD_RANGE, VOXEL_SIZE and downsample_stride.

Can u suggest some values?

Thanks

By the way, if you use the voxel_size of [ 0.4, 0.4, 0.1875 ] and POINT_CLOUD_RANGE of [0, -39.68, -3, 69.12, 39.68, 1], the sparse_shape should be [173, 199,22].

ZoangX commented 1 year ago

According your config, you use the wrong sparse_shape: [400, 300, 32] and INPUT_SHAPE: [468, 468, 1].

Thank you for your reply What are the values fit on Sparse_shape and Input_shape ? is it realated to point cloud range? Thank you

POINT_CLOUD_RANGE, VOXEL_SIZE and downsample_stride.

Can u suggest some values? Thanks

By the way, if you use the voxel_size of [ 0.4, 0.4, 0.1875 ] and POINT_CLOUD_RANGE of [0, -39.68, -3, 69.12, 39.68, 1], the sparse_shape should be [173, 199,22].

@thatnn Have you trained successfully after using the above parameters?