TRAILab / CaDDN

Categorical Depth Distribution Network for Monocular 3D Object Detection (CVPR 2021 Oral)
Apache License 2.0
366 stars 62 forks source link

Is the depth prediction network pre-trained on a larger depth dataset other than KITTI itself ? #63

Closed Cc-Hy closed 3 years ago

Cc-Hy commented 3 years ago

HELLO, good job! I have been looking into different mono 3D detectors recently, and i noticed some methods surpass CaDDN by 2-3 points in the KITTI test set up to now. By reading the article , i noticed a significant step in these methods is to use a lager depth dataset other than KITTI , e.g. DDAD to pre-train the depth prediction network and then fine-tune it to the KITTI , which results in a huge improvement in AP. SO i'd like to know if the backbone of CaDDN is pre-trained on other lager depth dataset . If not, i do not think these 2 methods can be compared fairly, and why not try it i.e. pre-traing CaDDN on a lager depth dataset, and lift CaDDN to a new level. I have some new idea on CaDDN and am trying, but first i think i should control the resource uesd to be same compared with the other methods, i.e. i'd like to use CaDDN to pre-train on a large dataset to see how CaDDN can perform. Otherwise i do not think i can surpass those methods using extra large dataset .

codyreading commented 3 years ago

Hi and thanks for the interest!

The depth prediction network is not pre-trained on an additional depth dataset. I attempted to pretrained it on the DDAD and Virtual KITTI, in which both did not give me a noticeable improvement on the Validation set.

I would like to point out that the image backbone is actually pretrained on MS-COCO for the semantic segmentation task, as I initialize the image backbone from a DeepLabV3 model trained on MS-COCO. I found this improves results by 1-2% on the validation set.

Cc-Hy commented 3 years ago

THANKS for you anwsering, i noticed you mentioned Waymo dataset results in the article, and does the traing on Waymo help to improve the performence on KITTI ? But i think , theoretically, KITTI is a quite small dataset and a depth pre-traing on other lager dataset should be helpful, since the depth estimation is key to mono detecors. I think this may be due to the misalignment of different datasets.

codyreading commented 3 years ago

Hi,

I never attempted models that were trained on both Waymo and KITTI. The issue with using different datasets is exactly as you point out, there is a misalignment between these that makes training on one dataset not helpful on any others. For example, the depth prediction network is implicitly dependent on the camera intrinsics, as it learns how to produce depth information directly from images. The camera intrinsics differ between datasets (KITTI/Waymo), so any depth specific learning on one dataset isn't very helpful on another. Additionally, the voxel grid sizes differ between the two.

My original thought was pretraining would only be helpful on datasets that are very similar, which is why I tried to use Virtual KITTI. However, this ended up not being helpful.