I'm looking into the monocular 3D object detection on Waymo Open Dataset.
In your paper, you showed results in this table:
In the recent CVPR paper, "MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection", they summarized the results in this table:
Their test set results on KITTI seem to be better than CaDDN.
All in all, your CaDDN achieves 5% AP at 0.7 IoU whereas MonoJSG is barely 1% AP at the same threshold. I believe both of your methods used the same setting, i.e., front cameras and vehicle class.
Do you have any ideas for where the difference might come from?
Thanks a lot!
Hi,
I'm looking into the monocular 3D object detection on Waymo Open Dataset. In your paper, you showed results in this table:![image](https://user-images.githubusercontent.com/7334548/164533051-6407e5d7-05cf-4e53-92df-5f033be3f77a.png)
In the recent CVPR paper, "MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection", they summarized the results in this table:
Their test set results on KITTI seem to be better than CaDDN.
![image](https://user-images.githubusercontent.com/7334548/164534646-d580550f-8a31-4243-959b-cf4a7819e909.png)
All in all, your CaDDN achieves 5% AP at 0.7 IoU whereas MonoJSG is barely 1% AP at the same threshold. I believe both of your methods used the same setting, i.e., front cameras and vehicle class.
Do you have any ideas for where the difference might come from? Thanks a lot!