Closed yanchi-3dv closed 3 months ago
Thank you for your outstanding work. I noticed in the paper the statement: “Our observations indicate that 2D detection in a single-view (camera plane) often has a stronger ability to generalize than multi-camera 3D object detection, as shown in Fig. 1.” After reviewing Fig. 1, I am still a bit confused. Is the stronger generalization due to the fact that depth estimation is not required, which means the bounding box does not need to be adjusted in the d-dimensional space?
Thank you for your interest in my work. In fact, overfitting camera parameters and scenes can lead to drastic changes in the new scene. It is completely inevitable that the depth estimation is inaccurate, and even the latest DepthAnything results are extremely poor and even negatively correlated with the standard point cloud depth. This causes the 3D box to be reprojected back away from the object.
Understood. Thank you for your excellent work and quick response!
Thank you for your outstanding work. I noticed in the paper the statement: “Our observations indicate that 2D detection in a single-view (camera plane) often has a stronger ability to generalize than multi-camera 3D object detection, as shown in Fig. 1.” After reviewing Fig. 1, I am still a bit confused. Is the stronger generalization due to the fact that depth estimation is not required, which means the bounding box does not need to be adjusted in the d-dimensional space?