XuyangBai / TransFusion

[PyTorch] Official implementation of CVPR2022 paper "TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers". https://arxiv.org/abs/2203.11496
Apache License 2.0
619 stars 76 forks source link

Heatmap size in transfusion_head.py #52

Open Galaxy-ZRX opened 2 years ago

Galaxy-ZRX commented 2 years ago

Hi Xuyang, thanks for your work on TransFusion firstly! I am trying to train this model in the KITTI dataset. I noticed that when you calculated the loss of heatmap, the ground truth heatmap is obtained via https://github.com/XuyangBai/TransFusion/blob/53370467c1b88f163cbe7b7300a1f588a6761e35/mmdet3d/models/dense_heads/transfusion_head.py#L1192

As you can see that the size of gt_heatmap a rotated version of the original feature map, could you please tell me why the rotation is used? Actually, when I train the model with KITTI dataset, if I set the point cloud range is set to [0, -40, -3.0, 70.0, 40, 1.0], the predicted heatmap will have a size of 1x1x200x176, but the gt_heatmap will be 1x1x176x200. It seems like the rotation will make them mismatched. In the case of nuScenes and waymo, the heatmap is a square so it's fine, but I can't understand the issue in KITTI cases and don't know how to solve it.

Could you please give me some advise? Thank you very much. I am currently stuck in this problem T-T and looking forward to your reply!

XuyangBai commented 2 years ago

These two dimensions are permuted in https://github.com/XuyangBai/TransFusion/blob/399bda09a3b6449313ccc302df40651f77ec78bf/mmdet3d/ops/voxel/voxelize.py#L95-L105

You should change the config accordingly, i.e.:

    pts_middle_encoder=dict(
        type='SparseEncoder',
        in_channels=5,
        sparse_shape=[1, 704, 800],
        output_channels=128,
        order=('conv', 'norm', 'act'),
        encoder_channels=((16, 16, 32), (32, 32, 64), (64, 64, 128), (128, 128)),
        encoder_paddings=((0, 0, 1), (0, 0, 1), (0, 0, [0, 1, 1]), (0, 0)),
        block_type='basicblock'),