Cc-Hy / CMKD

Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection (ECCV 2022 Oral)
Apache License 2.0
107 stars 9 forks source link

This version of code #19

Open munchmoo opened 1 year ago

munchmoo commented 1 year ago

HELLO,

I have a question regarding this repo.

Without BEV DA module and BEV KD loss, the student model is same as CaDDN. Does this repo version(using Resnet50 and some layer modification in detection, loss weight) of CaDDN produces comparable results with CaDDN(Resnet101) ?

+) in order to just reproduce the student model without KD loss, what config do I have to use?

Thank you.

Cc-Hy commented 1 year ago

Hi,

  1. In fact, when I am using the official code of CaDDN (res101), I can not reproduce its performance on my device. And the results I get are 1-3 point lower than the given one. And the setting here (res50 and something else) has a similar performance. Something like this:
    
    Car AP_R40@0.70, 0.70, 0.70:
    bbox AP:97.8666, 89.5661, 84.1687
    bev  AP:27.9056, 20.8787, 18.4701
    3d   AP:20.5984, 14.7481, 12.5899
    aos  AP:97.06, 88.39, 82.54

Pedestrian AP_R40@0.50, 0.50, 0.50: bbox AP:52.0106, 44.0633, 38.3137 bev AP:15.8919, 11.2907, 8.9604 3d AP:12.5201, 8.7893, 6.9510 aos AP:25.18, 22.10, 19.52

Cyclist AP_R40@0.50, 0.50, 0.50: bbox AP:44.9809, 26.3292, 25.6032 bev AP:2.9665, 1.6742, 1.3987 3d AP:2.3066, 1.1249, 0.9944 aos AP:33.45, 19.37, 18.75


Not sure if it is caused by the hardware and software differences. So I recommend you to run it on your device and set it as the baseline.

2. To just use the student model,  you can use [this file](https://drive.google.com/file/d/1KAY3FzqeSaiP3uVUzdtJ6Whcq_YnS90o/view?usp=share_link). 
munchmoo commented 1 year ago

@Cc-Hy Is this result(res50 and something else)trained on kitti-train and evaluated on kitti-val?

Cc-Hy commented 1 year ago

Yes, and when we use the official settings, we also get similar results.

Cc-Hy commented 1 year ago

And the results are not very stable, but this is a typical one.

munchmoo commented 1 year ago

@Cc-Hy Thank you, I am trying to reproduce the Table 1 paper results. +) for the ablation study Does 'Pre' imply using depth-backbone pre-trained on KITTI-train for 40 epochs? Is this depth pretrained-backbone different from pre-trained DeeplabV3 model ? (which is provided in this repo)

스크린샷 2023-02-20 오후 8 37 49
Cc-Hy commented 1 year ago

Hi, to reproduce the Table 1 results, you should use the whole unlabeled data from kitti raw, and I will upload the split in a few days.

'Pre' means depth pre-training on kitti train, and the pre-trained DeeplabV3 model is pre-trained on COCO with 2D detection task, which is provided by torchvision, not depth pre-training.

TimGor1997 commented 6 months ago

作者您好!请问如何在kitti上进行深度预训练呢?是否可以提供一下您使用的预训练checkpoint呢?我的邮箱是:595603009@qq.com,期待您的回复,谢谢!