Haiyang-W / DSVT

[CVPR2023] Official Implementation of "DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets"
https://arxiv.org/abs/2301.06051
Apache License 2.0
390 stars 29 forks source link

DSVT-P trainning on Kitti Dataset #64

Closed dinvincible98 closed 12 months ago

dinvincible98 commented 1 year ago

Hi,

I tried to train a dsvt-pillar model using the kitti dataset, below is my config:

CLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist']

DATA_CONFIG: 
    _BASE_CONFIG_: cfgs/dataset_configs/kitti_dataset.yaml
    POINT_CLOUD_RANGE: [0, -40, -3, 70.4, 40, 1]

    DATA_AUGMENTOR:
        DISABLE_AUG_LIST: ['placeholder']
        AUG_CONFIG_LIST:
            - NAME: gt_sampling
              USE_ROAD_PLANE: True
              DB_INFO_PATH:
                  - kitti_dbinfos_train.pkl
              PREPARE: {
                 filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
                 filter_by_difficulty: [-1],
              }

              SAMPLE_GROUPS: ['Car:15','Pedestrian:15', 'Cyclist:15']
              NUM_POINT_FEATURES: 4
              REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
              LIMIT_WHOLE_SCENE: True

            - NAME: random_world_flip
              ALONG_AXIS_LIST: ['x','y']

            - NAME: random_world_rotation
              WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]

            - NAME: random_world_scaling
              WORLD_SCALE_RANGE: [0.95, 1.05]
            - NAME: random_world_translation
              NOISE_TRANSLATE_STD: [0.5, 0.5, 0.5]
    DATA_PROCESSOR:
      -   NAME: mask_points_and_boxes_outside_range
          REMOVE_OUTSIDE_BOXES: True
      -   NAME: shuffle_points
          SHUFFLE_ENABLED: {
            'train': True,
            'test': False
          }
      -   NAME: transform_points_to_voxels_placeholder
          VOXEL_SIZE: [ 0.1505, 0.1709, 4 ]

MODEL:
  NAME: CenterPoint

  VFE:
    NAME: DynPillarVFE3D
    WITH_DISTANCE: False
    USE_ABSLOTE_XYZ: True
    USE_NORM: True
    NUM_FILTERS: [192, 192]

  BACKBONE_3D:
    NAME: DSVT
    INPUT_LAYER:
      sparse_shape: [468, 468, 1]
      downsample_stride: []
      d_model: [192]
      set_info: [[36, 4]]
      window_shape: [[12, 12, 1]]
      hybrid_factor: [2, 2, 1] # x, y, z
      shifts_list: [[[0, 0, 0], [6, 6, 0]]]
      normalize_pos: False

    block_name: ['DSVTBlock']
    set_info: [[36, 4]]
    d_model: [192]
    nhead: [8]
    dim_feedforward: [384]

    dropout: 0.0 
    activation: gelu
    reduction_type: 'attention'
    output_shape: [468, 468]
    conv_out_channel: 192
    # ues_checkpoint: True

  MAP_TO_BEV:
    NAME: PointPillarScatter3d
    INPUT_SHAPE: [468, 468, 1]
    NUM_BEV_FEATURES: 192

  BACKBONE_2D:
    NAME: BaseBEVResBackbone
    LAYER_NUMS: [ 1, 2, 2 ]
    LAYER_STRIDES: [ 1, 2, 2 ]
    NUM_FILTERS: [ 128, 128, 256 ]
    UPSAMPLE_STRIDES: [ 1, 2, 4 ]
    NUM_UPSAMPLE_FILTERS: [ 128, 128, 128 ]

  DENSE_HEAD:
    NAME: CenterHead
    CLASS_AGNOSTIC: False

    CLASS_NAMES_EACH_HEAD: [
      ['Car', 'Pedestrian', 'Cyclist']
    ]

    SHARED_CONV_CHANNEL: 64
    USE_BIAS_BEFORE_NORM: False
    NUM_HM_CONV: 2

    BN_EPS: 0.001
    BN_MOM: 0.01
    SEPARATE_HEAD_CFG:
      HEAD_ORDER: ['center', 'center_z', 'dim', 'rot']
      HEAD_DICT: {
        'center': {'out_channels': 2, 'num_conv': 2},
        'center_z': {'out_channels': 1, 'num_conv': 2},
        'dim': {'out_channels': 3, 'num_conv': 2},
        'rot': {'out_channels': 2, 'num_conv': 2},
        'iou': {'out_channels': 1, 'num_conv': 2},
      }

    TARGET_ASSIGNER_CONFIG:
      FEATURE_MAP_STRIDE: 1
      NUM_MAX_OBJS: 500
      GAUSSIAN_OVERLAP: 0.1
      MIN_RADIUS: 2

    IOU_REG_LOSS: True

    LOSS_CONFIG:
      LOSS_WEIGHTS: {
        'cls_weight': 1.0,
        'loc_weight': 2.0,
        'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
      }

    POST_PROCESSING:
      SCORE_THRESH: 0.5
      POST_CENTER_LIMIT_RANGE: [-80, -80, -10.0, 80, 80, 10.0]
      MAX_OBJ_PER_SAMPLE: 500

      USE_IOU_TO_RECTIFY_SCORE: True
      IOU_RECTIFIER: [0.68, 0.71, 0.65]

      NMS_CONFIG:
        # NMS_TYPE: multi_class_nms  # only for centerhead, use mmdet3d version nms
        # NMS_THRESH: [0.7, 0.6, 0.55]
        # NMS_PRE_MAXSIZE: [4096, 4096, 4096]
        # NMS_POST_MAXSIZE: [500, 500, 500]

        NMS_TYPE: nms_gpu 
        NMS_THRESH: 0.1
        NMS_PRE_MAXSIZE: 4096
        NMS_POST_MAXSIZE: 500

  POST_PROCESSING:
    RECALL_THRESH_LIST: [0.3, 0.5, 0.7]

    EVAL_METRIC: kitti

OPTIMIZATION:
    BATCH_SIZE_PER_GPU: 1
    NUM_EPOCHS: 20

    OPTIMIZER: adam_onecycle
    LR: 0.001
    WEIGHT_DECAY: 0.01
    MOMENTUM: 0.9

    MOMS: [0.95, 0.85]
    PCT_START: 0.4
    DIV_FACTOR: 10
    DECAY_STEP_LIST: [35, 45]
    LR_DECAY: 0.1
    LR_CLIP: 0.0000001

    LR_WARMUP: False
    WARMUP_EPOCH: 1

    GRAD_NORM_CLIP: 10
    LOSS_SCALE_FP16: 32.0

HOOK:
  DisableAugmentationHook:
    DISABLE_AUG_LIST: ['gt_sampling','random_world_flip','random_world_rotation','random_world_scaling', 'random_world_translation']
    NUM_LAST_EPOCHS: 1

I only modified the point cloud range to match with the kitti settings and the voxel size to match with the default sparce shape [468, 468, 1], but I am constantly getting an error:

RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

I traced down the error happened in DynamicPillarVFE3D module where the batch_dict['points'] often return some empty tensor point. However, when I tried to use the default point cloud range from waymo settings: [-74.88, -74.88, -2, 74.88, 74.88, 4.0], this error disappered. Can u give me some guidance?

Thank you!

xifen523 commented 1 year ago

Did you train successfully on the kitti? How did it turn out?

dinvincible98 commented 1 year ago

Did you train successfully on the kitti? How did it turn out?

No, I always get Nan or Inf error during trainning. I guess there are some hyperparameter issues.

xifen523 commented 1 year ago

Did you train successfully on the kitti? How did it turn out?

No, I always get Nan or Inf error during trainning. I guess there are some hyperparameter issues.

I will try to run this code on the KITTI in the future when I have some free time.

Haiyang-W commented 1 year ago

Very sorry for the late reply, I'm rushing some ddls. We haven't tried kitti dataset. You can see if issue59 will be helpful.

dinvincible98 commented 1 year ago

Did you train successfully on the kitti? How did it turn out?

No, I always get Nan or Inf error during trainning. I guess there are some hyperparameter issues.

I will try to run this code on the KITTI in the future when I have some free time.

There are some data augumentor issues, here's the modified config, you can try this to see if get Nan or Inf error:

    CLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist']

    DATA_CONFIG: 
        _BASE_CONFIG_: cfgs/dataset_configs/kitti_dataset.yaml
        POINT_CLOUD_RANGE: [0, -39.68, -3, 69.12, 39.68, 1]
        DATA_PROCESSOR:
            - NAME: mask_points_and_boxes_outside_range
              REMOVE_OUTSIDE_BOXES: True

            - NAME: shuffle_points
              SHUFFLE_ENABLED: {
                'train': True,
                'test': False
              }

            - NAME: transform_points_to_voxels
              VOXEL_SIZE: [0.1477, 0.1696, 4]
              MAX_POINTS_PER_VOXEL: 32
              MAX_NUMBER_OF_VOXELS: {
                'train': 16000,
                'test': 40000
              }
        DATA_AUGMENTOR:
            DISABLE_AUG_LIST: ['placeholder']
            AUG_CONFIG_LIST:
                - NAME: gt_sampling
                  USE_ROAD_PLANE: True
                  DB_INFO_PATH:
                      - kitti_dbinfos_train.pkl
                  PREPARE: {
                     filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
                     filter_by_difficulty: [-1],
                  }

                  SAMPLE_GROUPS: ['Car:15','Pedestrian:15', 'Cyclist:15']
                  NUM_POINT_FEATURES: 4
                  DATABASE_WITH_FAKELIDAR: False
                  REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
                  LIMIT_WHOLE_SCENE: False

                - NAME: random_world_flip
                  ALONG_AXIS_LIST: ['x']

                - NAME: random_world_rotation
                  WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]

                - NAME: random_world_scaling
                  WORLD_SCALE_RANGE: [0.95, 1.05]
    MODEL:
      NAME: CenterPoint

      VFE:
        NAME: DynPillarVFE3D
        WITH_DISTANCE: False
        USE_ABSLOTE_XYZ: True
        USE_NORM: True
        NUM_FILTERS: [ 192, 192 ]

      BACKBONE_3D:
        NAME: DSVT
        INPUT_LAYER:
          sparse_shape: [468, 468, 1]
          downsample_stride: []
          d_model: [192]
          set_info: [[36, 4]]
          window_shape: [[12, 12, 1]]
          hybrid_factor: [2, 2, 1] # x, y, z
          shifts_list: [[[0, 0, 0], [6, 6, 0]]]
          normalize_pos: False

        block_name: ['DSVTBlock']
        set_info: [[36, 4]]
        d_model: [192]
        nhead: [8]
        dim_feedforward: [384]
        dropout: 0.0
        activation: gelu
        output_shape: [468, 468]
        conv_out_channel: 192
        # ues_checkpoint: True

      MAP_TO_BEV:
        NAME: PointPillarScatter3d
        INPUT_SHAPE: [468, 468, 1]
        NUM_BEV_FEATURES: 192

      BACKBONE_2D:
        NAME: BaseBEVResBackbone
        LAYER_NUMS: [ 1, 2, 2 ]
        LAYER_STRIDES: [ 1, 2, 2 ]
        NUM_FILTERS: [ 128, 128, 256 ]
        UPSAMPLE_STRIDES: [ 1, 2, 4 ]
        NUM_UPSAMPLE_FILTERS: [ 128, 128, 128 ]

      DENSE_HEAD:
        NAME: CenterHead
        CLASS_AGNOSTIC: False

        CLASS_NAMES_EACH_HEAD: [
          ['Car', 'Pedestrian', 'Cyclist']
        ]

        SHARED_CONV_CHANNEL: 64
        USE_BIAS_BEFORE_NORM: True
        NUM_HM_CONV: 2

        BN_EPS: 0.001
        BN_MOM: 0.01
        SEPARATE_HEAD_CFG:
          HEAD_ORDER: ['center', 'center_z', 'dim', 'rot']
          HEAD_DICT: {
            'center': {'out_channels': 2, 'num_conv': 2},
            'center_z': {'out_channels': 1, 'num_conv': 2},
            'dim': {'out_channels': 3, 'num_conv': 2},
            'rot': {'out_channels': 2, 'num_conv': 2},
            'iou': {'out_channels': 1, 'num_conv': 2},
          }

        TARGET_ASSIGNER_CONFIG:
          FEATURE_MAP_STRIDE: 1
          NUM_MAX_OBJS: 500
          GAUSSIAN_OVERLAP: 0.1
          MIN_RADIUS: 2

        IOU_REG_LOSS: True

        LOSS_CONFIG:
          LOSS_WEIGHTS: {
            'cls_weight': 1.0,
            'loc_weight': 2.0,
            'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
          }

        POST_PROCESSING:
            RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
            SCORE_THRESH: 0.1
            OUTPUT_RAW_SCORE: False
            POST_CENTER_LIMIT_RANGE: [0, -40, -3, 75, 40, 1]
            MAX_OBJ_PER_SAMPLE: 500

            EVAL_METRIC: kitti

            NMS_CONFIG:
                MULTI_CLASSES_NMS: False
                NMS_TYPE: nms_gpu
                NMS_THRESH: 0.01
                NMS_PRE_MAXSIZE: 4096
                NMS_POST_MAXSIZE: 500

    OPTIMIZATION:
        BATCH_SIZE_PER_GPU: 1
        NUM_EPOCHS: 40

        OPTIMIZER: adam_onecycle
        LR: 0.001
        WEIGHT_DECAY: 0.01
        MOMENTUM: 0.9

        MOMS: [0.95, 0.85]
        PCT_START: 0.4
        DIV_FACTOR: 10
        DECAY_STEP_LIST: [35, 45]
        LR_DECAY: 0.1
        LR_CLIP: 0.0000001

        LR_WARMUP: False
        WARMUP_EPOCH: 1

        GRAD_NORM_CLIP: 10
dinvincible98 commented 1 year ago

Very sorry for the late reply, I'm rushing some ddls. We haven't tried kitti dataset. You can see if issue59 will be helpful.

Yes, I checked this issue so I recalculate the voxel size. The sparse shape matched with pillar settings but the training will throw Nan or Inf error after multiple epochs

evil-master commented 1 year ago

很抱歉回复晚了,我正在赶一些ddls。我们还没有尝试过kitti数据集。您可以查看 issue59 是否有帮助。

是的,我检查了这个问题,所以我重新计算了体素大小。稀疏形状与柱子设置匹配,但训练会在多个 epoch 后抛出 Nan 或 Inf 错误 May I ask if you can adapt to the Kitti dataset by simply modifying the config file without modifying the network? Besides, why is your backbone_ 3D Don't need downsampling

Haiyang-W commented 1 year ago

If anyone has succeeded on KITTI by modifying the config, please share the corresponding config and experimental results in this issue. We will be very grateful for your contribution to the community. :)

After I finish the CVPR deadline, I'll take a look when I have time. I guess this shouldn't be a very difficult problem.

Haiyang-W commented 1 year ago

很抱歉回复晚了,我正在赶一些ddls。我们还没有尝试过kitti数据集。您可以查看 issue59 是否有帮助。

是的,我检查了这个问题,所以我重新计算了体素大小。稀疏形状与柱子设置匹配,但训练会在多个 epoch 后抛出 Nan 或 Inf 错误 May I ask if you can adapt to the Kitti dataset by simply modifying the config file without modifying the network? Besides, why is your backbone_ 3D Don't need downsampling

I guess he use DSVT-pillar version.

123susu commented 1 year ago

Very sorry for the late reply, I'm rushing some ddls. We haven't tried kitti dataset. You can see if issue59 will be helpful.

Yes, I checked this issue so I recalculate the voxel size. The sparse shape matched with pillar settings but the training will throw Nan or Inf error after multiple epochs

did you complete kitti config? l really need this,thanks!

Haiyang-W commented 12 months ago

Any update?

dinvincible98 commented 12 months ago

I have a functional config for training kitti dataset:

CLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist']
DATA_CONFIG: 
_BASE_CONFIG_: cfgs/dataset_configs/kitti_dataset.yaml
POINT_CLOUD_RANGE: [0, -39.68, -3, 69.12, 39.68, 1]

DATA_PROCESSOR:
    - NAME: mask_points_and_boxes_outside_range
      REMOVE_OUTSIDE_BOXES: True

    - NAME: shuffle_points
      SHUFFLE_ENABLED: {
        'train': True,
        'test': False
      }

    - NAME: transform_points_to_voxels_placeholder
      VOXEL_SIZE: [0.1477, 0.1696, 4]
      MAX_POINTS_PER_VOXEL: 32
      MAX_NUMBER_OF_VOXELS: {
       'train': 16000,
       'test': 40000
      }

DATA_AUGMENTOR:
    DISABLE_AUG_LIST: ['placeholder']
    AUG_CONFIG_LIST:
        - NAME: gt_sampling
          USE_ROAD_PLANE: True
          DB_INFO_PATH:
              - kitti_dbinfos_train.pkl
          PREPARE: {
             filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
             filter_by_difficulty: [-1],
          }

          SAMPLE_GROUPS: ['Car:15','Pedestrian:15', 'Cyclist:15']
          NUM_POINT_FEATURES: 4
          DATABASE_WITH_FAKELIDAR: False
          REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
          LIMIT_WHOLE_SCENE: False

        - NAME: random_world_flip
          ALONG_AXIS_LIST: ['x']

        - NAME: random_world_rotation
          WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]

        - NAME: random_world_scaling
          WORLD_SCALE_RANGE: [0.95, 1.05]

        - NAME: random_local_pyramid_aug
          DROP_PROB: 0.25
          SPARSIFY_PROB: 0.05
          SPARSIFY_MAX_NUM: 50
          SWAP_PROB: 0.1
          SWAP_MAX_NUM: 50
MODEL:
NAME: CenterPoint

VFE:
NAME: DynPillarVFE3D
WITH_DISTANCE: False
USE_ABSLOTE_XYZ: True
USE_NORM: True
NUM_FILTERS: [ 192, 192 ]

BACKBONE_3D:
NAME: DSVT
INPUT_LAYER:
  sparse_shape: [468, 468, 1]
  downsample_stride: []
  d_model: [192]
  set_info: [[36, 4]]
  window_shape: [[12, 12, 1]]
  hybrid_factor: [2, 2, 1] # x, y, z
  shifts_list: [[[0, 0, 0], [6, 6, 0]]]
  normalize_pos: False

block_name: ['DSVTBlock']
set_info: [[36, 4]]
d_model: [192]
nhead: [8]
dim_feedforward: [384]
dropout: 0.0
activation: gelu
output_shape: [468, 468]
conv_out_channel: 192
ues_checkpoint: True

MAP_TO_BEV:
NAME: PointPillarScatter3d
INPUT_SHAPE: [468, 468, 1]
NUM_BEV_FEATURES: 192

BACKBONE_2D:
NAME: BaseBEVResBackbone
LAYER_NUMS: [ 1, 2, 2 ]
LAYER_STRIDES: [ 1, 2, 2 ]
NUM_FILTERS: [ 128, 128, 256 ]
UPSAMPLE_STRIDES: [ 1, 2, 4 ]
NUM_UPSAMPLE_FILTERS: [ 128, 128, 128 ]

DENSE_HEAD:
NAME: CenterHead
CLASS_AGNOSTIC: False

CLASS_NAMES_EACH_HEAD: [
  ['Car', 'Pedestrian', 'Cyclist']
]

SHARED_CONV_CHANNEL: 64
USE_BIAS_BEFORE_NORM: False
NUM_HM_CONV: 2

BN_EPS: 0.001
BN_MOM: 0.01
SEPARATE_HEAD_CFG:
  HEAD_ORDER: ['center', 'center_z', 'dim', 'rot']
  HEAD_DICT: {
    'center': {'out_channels': 2, 'num_conv': 2},
    'center_z': {'out_channels': 1, 'num_conv': 2},
    'dim': {'out_channels': 3, 'num_conv': 2},
    'rot': {'out_channels': 2, 'num_conv': 2},
    'iou': {'out_channels': 1, 'num_conv': 2},
  }

TARGET_ASSIGNER_CONFIG:
  FEATURE_MAP_STRIDE: 1
  NUM_MAX_OBJS: 500
  GAUSSIAN_OVERLAP: 0.1
  MIN_RADIUS: 2
  # BOX_CODER: ResidualCoder

IOU_REG_LOSS: True

LOSS_CONFIG:
  LOSS_WEIGHTS: {
    'cls_weight': 1.0,
    'loc_weight': 2.0,
    'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
  }

POST_PROCESSING:
  # RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
  SCORE_THRESH: 0.1
  OUTPUT_RAW_SCORE: False
  POST_CENTER_LIMIT_RANGE: [0, -40, -3, 80, 40, 1]
  MAX_OBJ_PER_SAMPLE: 500

  # USE_IOU_TO_RECTIFY_SCORE: True
  # IOU_RECTIFIER: [0.5, 0.71, 0.65]

  NMS_CONFIG:
    MULTI_CLASSES_NMS: False
    NMS_TYPE: nms_gpu
    NMS_THRESH: 0.01
    NMS_PRE_MAXSIZE: 4096
    NMS_POST_MAXSIZE: 500

POST_PROCESSING:
RECALL_THRESH_LIST: [0.3, 0.5, 0.7]

EVAL_METRIC: kitti

OPTIMIZATION:
BATCH_SIZE_PER_GPU: 2
NUM_EPOCHS: 80

OPTIMIZER: adam_onecycle
LR: 0.001
WEIGHT_DECAY: 0.01
MOMENTUM: 0.9

MOMS: [0.95, 0.85]
PCT_START: 0.4
DIV_FACTOR: 10
DECAY_STEP_LIST: [35, 45]
LR_DECAY: 0.1
LR_CLIP: 0.0000001

LR_WARMUP: False
WARMUP_EPOCH: 1

GRAD_NORM_CLIP: 10
LOSS_SCALE_FP16: 32.0

And I got results below:

Generate label finished(sec_per_example: 0.0629 second).
recall_roi_0.3: 0.000000
recall_rcnn_0.3: 0.939800
recall_roi_0.5: 0.000000
recall_rcnn_0.5: 0.888598
recall_roi_0.7: 0.000000
recall_rcnn_0.7: 0.669268
Average predicted number of objects(3769 samples): 12.283

Car AP@0.70, 0.70, 0.70:
bbox AP:95.2198, 89.4526, 88.8744
bev  AP:89.3524, 87.4650, 86.4434
3d   AP:87.1649, 77.6970, 76.9884
aos  AP:95.20, 89.33, 88.69
Car AP_R40@0.70, 0.70, 0.70:
bbox AP:97.0173, 94.1119, 91.8572
bev  AP:92.0481, 88.3356, 87.8141
3d   AP:87.8954, 80.9305, 78.6656
aos  AP:97.00, 93.96, 91.65
Car AP@0.70, 0.50, 0.50:
bbox AP:95.2198, 89.4526, 88.8744
bev  AP:95.2554, 89.6240, 89.1787
3d   AP:95.1996, 89.5775, 89.0910
aos  AP:95.20, 89.33, 88.69
Car AP_R40@0.70, 0.50, 0.50:
bbox AP:97.0173, 94.1119, 91.8572
bev  AP:97.2790, 94.6162, 94.2047
3d   AP:97.2439, 94.5121, 94.0162
aos  AP:97.00, 93.96, 91.65
Pedestrian AP@0.50, 0.50, 0.50:
bbox AP:68.9796, 66.8907, 64.9734
bev  AP:58.0059, 55.0643, 52.5829
3d   AP:52.8691, 51.4204, 47.9159
aos  AP:64.75, 62.16, 59.99
Pedestrian AP_R40@0.50, 0.50, 0.50:
bbox AP:69.8867, 67.2526, 64.8893
bev  AP:56.3069, 53.6334, 50.8015
3d   AP:51.9532, 49.1637, 45.9747
aos  AP:65.05, 62.00, 59.43
Pedestrian AP@0.50, 0.25, 0.25:
bbox AP:68.9796, 66.8907, 64.9734
bev  AP:75.4927, 73.8959, 71.9835
3d   AP:74.6756, 72.9769, 71.1328
aos  AP:64.75, 62.16, 59.99
Pedestrian AP_R40@0.50, 0.25, 0.25:
bbox AP:69.8867, 67.2526, 64.8893
bev  AP:76.3137, 74.6665, 72.3919
3d   AP:75.3678, 73.5785, 71.5206
aos  AP:65.05, 62.00, 59.43
Cyclist AP@0.50, 0.50, 0.50:
bbox AP:88.9667, 77.5716, 74.3384
bev  AP:86.9765, 71.5262, 67.4459
3d   AP:85.9338, 69.3215, 66.2503
aos  AP:88.85, 77.08, 73.75
Cyclist AP_R40@0.50, 0.50, 0.50:
bbox AP:93.4071, 78.7487, 75.0949
bev  AP:91.3305, 71.9404, 67.9715
3d   AP:88.3232, 69.5222, 66.1419
aos  AP:93.27, 78.19, 74.49
Cyclist AP@0.50, 0.25, 0.25:
bbox AP:88.9667, 77.5716, 74.3384
bev  AP:87.2510, 74.5043, 70.9506
3d   AP:87.2510, 74.5037, 70.9506
aos  AP:88.85, 77.08, 73.75
Cyclist AP_R40@0.50, 0.25, 0.25:
bbox AP:93.4071, 78.7487, 75.0949
bev  AP:91.5098, 75.4892, 71.6936
3d   AP:91.5098, 75.4891, 71.6934
aos  AP:93.27, 78.19, 74.49
Haiyang-W commented 12 months ago

Thanks for your contribution! Very Nice!

But I am not familiar with KiTTi, may I ask if this performance is acceptable? Thanks! Looking forward your reply.

Haiyang-W commented 12 months ago

If this result turns out to be good, I will tag this issue to make it more accessible for those interested in running DSVT on KITTI. Many thanks!

dinvincible98 commented 12 months ago

I adopted the pointpillar settings and it has a slightly better performance compared to the pointpillar. I trained the model with a sinlgle GPU so the performance might be furtherly improved with multi-gpu training I guess.

Haiyang-W commented 12 months ago

I adopted the pointpillar settings and it has a slightly better performance compared to the pointpillar. I trained the model with a sinlgle GPU so the performance might be furtherly improved with multi-gpu training I guess.

Perhaps some further adjustments can be made; DSVT performs much better on Waymo and NuScenes compared to PointPillar. At least, its performance on KITTI should be close to that of MsSVT.

Haiyang-W commented 12 months ago

Thank @dinvincible98 , it seems that this issue has been resolved to some extent. The issue will be closed.

Thank you all for your contributions and discussions. :)

evil-master commented 6 months ago

作者你好,我成功配置了环境以及训练了kitti的数据,但是在转onnx模型时遇到了点问题,请问这个需要填写的是我生成数据集的pkl文件吗?在deploy.py中的path,我生成的文件是pkl ####### read input ####### batch_dict = torch.load("path to batch_dict.pth", map_location="cuda") inputs = batch_dict

evil-master commented 6 months ago

作者你好,我成功配置了环境以及训练了kitti的数据,但是在转onnx模型时遇到了点问题,请问这个需要填写的是我生成数据集的pkl文件吗?在deploy.py中的path,我生成的文件是pkl ####### read input ####### batch_dict = torch.load("path to batch_dict.pth", map_location="cuda") inputs = batch_dict

我在readme里面找到了inputdict.pth的下载地址,载入我基于kitii训练的权重,但是显示的报错是 File "deploy.py", line 134, in inputs = model.vfe(inputs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 219, in forward features = pfn(features, unq_inv) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 37, in forward x = self.linear(inputs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (61800x11 and 10x96)

evil-master commented 6 months ago

作者你好,我成功配置了环境以及训练了kitti的数据,但是在转onnx模型时遇到了点问题,请问这个需要填写的是我生成数据集的pkl文件吗?在deploy.py中的path,我生成的文件是pkl ####### read input ####### batch_dict = torch.load("path to batch_dict.pth", map_location="cuda") inputs = batch_dict

我在readme里面找到了inputdict.pth的下载地址,载入我基于kitii训练的权重,但是显示的报错是 File "deploy.py", line 134, in inputs = model.vfe(inputs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 219, in forward features = pfn(features, unq_inv) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 37, in forward x = self.linear(inputs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (61800x11 and 10x96)

作者你好,我成功配置了环境以及训练了kitti的数据,但是在转onnx模型时遇到了点问题,请问这个需要填写的是我生成数据集的pkl文件吗?在deploy.py中的path,我生成的文件是pkl ####### read input ####### batch_dict = torch.load("path to batch_dict.pth", map_location="cuda") inputs = batch_dict

我在readme里面找到了inputdict.pth的下载地址,载入我基于kitii训练的权重,但是显示的报错是 File "deploy.py", line 134, in inputs = model.vfe(inputs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 219, in forward features = pfn(features, unq_inv) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/user/cjg/DSVT/pcdet/models/backbones_3d/vfe/dynamic_pillar_vfe.py", line 37, in forward x = self.linear(inputs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/user/anaconda3/envs/dsvt/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (61800x11 and 10x96)

我这边查到问题了,提供的点云是有6个参数,而kitti数据只有5个,所以需要去掉最后一个维度就会可以了