EnVision-Research / Generalizable-BEV

153 stars 20 forks source link

Question about generalize statement in paper #6

Closed yanchi-3dv closed 2 months ago

yanchi-3dv commented 2 months ago

Thank you for your outstanding work. I noticed in the paper the statement: “Our observations indicate that 2D detection in a single-view (camera plane) often has a stronger ability to generalize than multi-camera 3D object detection, as shown in Fig. 1.” After reviewing Fig. 1, I am still a bit confused. Is the stronger generalization due to the fact that depth estimation is not required, which means the bounding box does not need to be adjusted in the d-dimensional space?

LuPaoPao commented 2 months ago

Thank you for your outstanding work. I noticed in the paper the statement: “Our observations indicate that 2D detection in a single-view (camera plane) often has a stronger ability to generalize than multi-camera 3D object detection, as shown in Fig. 1.” After reviewing Fig. 1, I am still a bit confused. Is the stronger generalization due to the fact that depth estimation is not required, which means the bounding box does not need to be adjusted in the d-dimensional space?

Thank you for your interest in my work. In fact, overfitting camera parameters and scenes can lead to drastic changes in the new scene. It is completely inevitable that the depth estimation is inaccurate, and even the latest DepthAnything results are extremely poor and even negatively correlated with the standard point cloud depth. This causes the 3D box to be reprojected back away from the object.

yanchi-3dv commented 2 months ago

Understood. Thank you for your excellent work and quick response!