XuyangBai / TransFusion

[PyTorch] Official implementation of CVPR2022 paper "TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers". https://arxiv.org/abs/2203.11496
Apache License 2.0
619 stars 76 forks source link

reimplementation problems on waymo #12

Open tangtaogo opened 2 years ago

tangtaogo commented 2 years ago

hello, thanks for your excellent work! But I have a problem with the reproduction of the waymo open dataset: I can get results of Transfusion-L: 'Overall/L1 mAP': 0.734978, 'Overall/L1 mAPH': 0.70693, 'Overall/L2 mAP': 0.671998, 'Overall/L2 mAPH': 0.645886 But the results of Transfusion-LC get worse: 'Overall/L1 mAP': 0.726501, 'Overall/L1 mAPH': 0.698618, 'Overall/L2 mAP': 0.663435, 'Overall/L2 mAPH': 0.637546

XuyangBai commented 2 years ago

Hi, sorry for the late reply. Did you first pre-train the 2D backbone on Waymo? Since we did not find any off-the-shelf 2D backbones pretrained on the waymo dataset, we followed the MaskRCNN config without the maskhead to train a Resnet50+FPN backbone on waymo as the 2D feature extractor. Then we use the following code to combine the pretrained 2D backbone and TransFusion-L as the load_from key of TransFusion.

img = torch.load('img_backbone.pth', map_location='cpu')
pts = torch.load('transfusionL.pth', map_location='cpu')
new_model = {"state_dict": pts["state_dict"]}
for k,v in img["state_dict"].items():
    if 'backbone' in k or 'neck' in k:
        new_model["state_dict"]['img_'+k] = v
        print('img_'+k)
torch.save(new_model, "fusion_model.pth")
tangtaogo commented 2 years ago

Yes, I have pre-trained a 2D backbone on Waymo first. The config and log as follows: waymo-2d-log.txt. And I fixed the backbone of image and lidar for training. I don't know where my problem is. Can you provide your 2d waymo model?

XuyangBai commented 2 years ago

Sorry I am not able to provide the model checkpoints. Your config looks good to me. One thing I forget to mention is that I actually change the data-preprocessing of waymo by changing tools/data_converter/waymo_converter.py L267 from from labels in frame.projected_lidar_labels to from labels in frame.camera_labels. The reason is that the projected_lidar_labels usually do not tightly fit the image boxes and contain objects that are totally occluded in the image space. And to verify whether your 2D backbone is well trained or not, you can perform some visualization on waymo 2D detection.

tangtaogo commented 2 years ago

Thanks for the kind reply, but if I directly changed for labels in _for labels in frame.projected_lidarlabels to _for labels in frame.cameralabels , it will not work. Maybe the same problem as https://github.com/waymo-research/waymo-open-dataset/issues/141. And I train directly with projected_lidar_labels, it shouldn't degenerate the model either.

yinjunbo commented 2 years ago

@Trent-tangtao , hi, I also encounter the same issue. Have you managed to obtain a more reasonable result with Transfusion-LC on Waymo?

Liaoqing-up commented 1 year ago

hello, i have a question about the experiment in waymo dataset. The lidar is 360°-filed range but the camera is around 120°, so what do you do with the fields of view where the data doesn't overlap?

Gaoeee commented 4 months ago

Hi, could you please share your lidar-only log?