dvlab-research / FocalsConv

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)
https://arxiv.org/abs/2204.12463
Apache License 2.0
371 stars 35 forks source link

I can't train the multi-modal accuracy(KITTI val split in AP3D(R11))you mentioned in the paper #10

Closed qimingx closed 2 years ago

qimingx commented 2 years ago

my val result: epochs:90 3d AP:89.3103, 85.0520, 79.2089

your val result in Table 8 Focal Conv-F 3d: 89.82 85.22 85.19

I wonder why can't improve accuracy in hard situation and how many epochs you train?

yukang2017 commented 2 years ago

Thanks for you interest in our work. It is also trained for 80 epochs. The accuracy in hard situation of KITTI is really unstable. We only get this one model for such a high result for hard situation of KITTI by chance. This is the training log for this result (epoch 70).

log_train_20211106-190258.txt

qimingx commented 2 years ago

Thanks for you interest in our work. It is also trained for 80 epochs. The accuracy in hard situation of KITTI is really unstable. We only get this one model for such a high result for hard situation of KITTI by chance. This is the training log for this result (epoch 70).

log_train_20211106-190258.txt

Thanks for answer. And I have another question that is how to amend voxel_rcnn_car.yaml? I replaced VoxelBackBone8x with VoxelBackBone8xFocal, but got a reduced accuracy compared with voxel_rcnn.

yukang2017 commented 2 years ago

For multi-modal setting, you also need modify the data config part in the config yaml file. Because it involves multi-modal data augmentations. Please refer to this part https://github.com/dvlab-research/FocalsConv/blob/2fbfdda0179d7a29e7a304b4ac9d67e866ee05d9/OpenPCDet/tools/cfgs/kitti_models/voxel_rcnn_car_focal_multimodal.yaml#L3

qimingx commented 2 years ago

For multi-modal setting, you also need modify the data config part in the config yaml file. Because it involves multi-modal data augmentations. Please refer to this part

https://github.com/dvlab-research/FocalsConv/blob/2fbfdda0179d7a29e7a304b4ac9d67e866ee05d9/OpenPCDet/tools/cfgs/kitti_models/voxel_rcnn_car_focal_multimodal.yaml#L3

I mean single-modal setting. I replaced VoxelBackBone8x with VoxelBackBone8xFocal, but got a reduced accuracy compared with voxel_rcnn.

yukang2017 commented 2 years ago

Is the VoxelBackBone8xFocal from the this config?

https://github.com/dvlab-research/FocalsConv/blob/2fbfdda0179d7a29e7a304b4ac9d67e866ee05d9/OpenPCDet/tools/cfgs/kitti_models/pv_rcnn_focal_lidar.yaml#L40

BACKBONE_3D:
    NAME: VoxelBackBone8xFocal
    MASK_MULTI: True
    ENLARGE_VOXEL_CHANNELS: 64
qimingx commented 2 years ago

Is the VoxelBackBone8xFocal from the this config?

https://github.com/dvlab-research/FocalsConv/blob/2fbfdda0179d7a29e7a304b4ac9d67e866ee05d9/OpenPCDet/tools/cfgs/kitti_models/pv_rcnn_focal_lidar.yaml#L40

BACKBONE_3D:
    NAME: VoxelBackBone8xFocal
    MASK_MULTI: True
    ENLARGE_VOXEL_CHANNELS: 64

yes,but baseline is voxel-rcnn

yukang2017 commented 2 years ago

It's weird. I will train it by myself and try to fix this latter. Thanks.

In addition, would you please try to set ENLARGE_VOXEL_CHANNELS as 128 ? It would return better results based on my experience.

qimingx commented 2 years ago

It's weird. I will train it by myself and try to fix this latter. Thanks.

In addition, would you please try to set ENLARGE_VOXEL_CHANNELS as 128 ? It would return bette results based on my experience.

ok!

yukang2017 commented 2 years ago

Hi, I have updated the code. Please use the below backbone config for lidar-only Voxel R-CNN.

    NAME: VoxelBackBone8xFocal
    USE_IMG: False
    ENLARGE_VOXEL_CHANNELS: 64
    KERNEL_SIZE: 5
    USE_STAGES: [1,]

I have trained for several times. You can get better results by this.

qimingx commented 2 years ago

I have trained for several times. You can get better results by this.

Ok, thank you!

yukang2017 commented 2 years ago

Thanks for your patience. I currently close this issue. Feel free to open other issues and any further discussions.