NVlabs / FoundationPose

[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
https://nvlabs.github.io/FoundationPose/
Other
955 stars 99 forks source link

Application on multiview setting #71

Closed JisuHann closed 1 week ago

JisuHann commented 4 weeks ago

Thank you for your splendid work!

I am planning to integrate this framework in my project, I need a ground truth pose prediction model which should be robust at occlusion. To achieve this, I am trying to utilize on multi-view (model-based setting).

I believe the simplest solution is working track_one as track; for each timestep, giving multiple camera prediction at refinement and then let the scorer score those and then choose the best pose prediction.

So I would like to ask three questions:

  1. Have you tried to expand or planning your work that can work on multi-view settings?
  2. Is there any other good solution that can easily solve this problem?
  3. If I want to use the aforementioned simplest solution, is there any issue that may happen? (regardless of the real-time operation delay due to multiple input processing)

Thank you!

wenbowen123 commented 4 weeks ago

in your multi-view setup, are the camera extrinsics calibrated? And is each of them RGBD camera?

JisuHann commented 3 weeks ago

@wenbowen123 thanks for your quick reply. Yes we use intel realsense D435 cameras so we get RGBD images, and they are calibrated too.

wenbowen123 commented 3 weeks ago

you can run FoundationPose with one of the cameras to get the object pose to that camera. Then you can use the extrinsics to project the pose to the rest of the cameras to get the GT. Or you can simply run on each camera independently.