NVlabs / FoundationPose

[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
https://nvlabs.github.io/FoundationPose/
Other
955 stars 99 forks source link

Tracking is easy to loss when there is large motion between nearby frames #103

Open byran-wang opened 1 week ago

byran-wang commented 1 week ago

Hello,

Thank you for your great work. I am interested in your research and am comparing the pose errors between BundleSDF and FoundationPose using my self-developed viewer.

I tested these on the HO3D dataset and found that tracking tends to be lost when there is significant motion between consecutive frames.

The first video was tested on the sequence ABF12 using your publicly released code on the HO3D dataset. As shown in the video, the pose error (ADD), indicated by the Y-axis, increases significantly at frame 729 for both BundleSDF and FoundationPose due to substantial object motion between frames 729 and 728.

https://github.com/NVlabs/FoundationPose/assets/16728958/4c32ed3b-b609-4f52-b3ef-ba76622c37fd

The second video was tested on the sequence GPMF14. Similarly, the pose error increases notably at frame 655 for both systems. There is considerable object motion between frames 655 and 654.

https://github.com/NVlabs/FoundationPose/assets/16728958/89cb528e-7c5f-4a20-bcd5-2ac8ae3268e4

So, my questions are:

Why is tracking prone to loss when there is significant motion between nearby frames? Why can't tracking be restored after it is lost in FoundationPose, even though the 3D model does not change?

Look forward your kindly reply. Thank you.

wenbowen123 commented 1 week ago

it does not look like your video is loaded in the right order. There are some jumps of events. Can you confirm?

byran-wang commented 1 week ago

@wenbowen123 The video came from a viewer recording. The Viewer is an offline tool developed by my self. So I can jump to any frames by the mouse or keyboard. Therefore, you can see some jumps of frames.

Additionaly, for the Viewer, it firstly loads calculated poses from the running result based on FoundationPose and BundleSDF original code. Then it calculate the pose errors (ADD) between GT poses and calculated poses. Finally, the pose errors are show on the viewer left side, and object pose in the camera on right side.

wenbowen123 commented 1 week ago

HO3D dataset removed some intermediate frames from the video, which causes unrealistic jumps in the video.

byran-wang commented 2 days ago

@wenbowen123 Yes, there are some missing frames in the HO3D dataset. I think these missing frames can simulate large motion scenarios. I wonder if FoundationPose and BundleSDF can work well in large motion scenarios? Thanks.

wenbowen123 commented 22 hours ago

For a reasonable amount of large motion, they should work. But not for the unrealistic frame dropping like the above video, cause it no longer makes sense for using the temporal cues (tracking)

I'd also suggest to double check your data loading, since the second video (meat can) at around 00:20, the video play sequence does not look right (even with dropping).