NVlabs / FoundationPose

[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
https://nvlabs.github.io/FoundationPose/
Other
1.01k stars 112 forks source link

How can I apply the model-free approach to my own objects? #32

Closed fertiliz closed 1 month ago

fertiliz commented 1 month ago

Hello, thank you for your excellent contribution. Our teacher wants me to acquire real-time pose on Hikvision's industrial entry-level depth camera to integrate with robotic arm grasping. I have successfully run the demo, but I've encountered issues with the YCBV dataset directory when trying to run the model-free setup. Could you please inform me about the data and directory structure I should prepare for the model-free version on YCBV? Additionally, could you briefly explain how I can connect to the camera to infer my own objects using the model-free setup?

wenbowen123 commented 1 month ago

What's the error you run into when running the YCBV model-free setup?

For your own use case, you need to prepare some reference images just like the way our LINEMOD and YCBV example outlines. Specifically, you need RGBD, camera-object poses, object masks. For masks you can use tools such as SAM. For poses you can consider BundleSDF. Alternatively, you can also record a RGBD video for your novel object and use BundleSDF to directly get a 3D model to run the model-based setup, just like that github repo demo shows.

fertiliz commented 1 month ago

[Thanks! I had run the Neural Object Field and generate step_0001000_mesh_real_world.obj in the /ref_views_16/ob_0000001/nerf folder. The execution code that resulted in an error is:“python run_ycb_video.py --ycbv_dir /home/robot/gdrnpp_bop2022/datasets/BOP_DATASETS/ycbv --use_reconstructed_mesh 1 --ref_view_dir /home/robot/下载/FoundationPose/nef-doc/ref_views_16”

The error log were: [init()] self.h5_file: [init()] Using pretrained model from /home/robot/下载/FoundationPose/learning/training/../../weights/2023-10-28-18-33-37/model_best.pth [init()] init done Traceback (most recent call last): File "/home/robot/下载/FoundationPose/run_ycb_video.py", line 149, in run_pose_estimation() File "/home/robot/下载/FoundationPose/run_ycb_video.py", line 114, in run_pose_estimation if not reader.is_keyframe(i): File "/home/robot/下载/FoundationPose/datareader.py", line 529, in is_keyframe return (key in self.keyframe_lines) AttributeError: 'YcbVideoReader' object has no attribute 'keyframe_lines'

My ycbv directory structure is(downloads from Bop data): |ycbv |--models | | models_info.json(The files used in the last file are from here. ) | | obj_xxxx.ply | |------obj_xxxx.png |--models_eval | | models_info.json | |------obj_xxxx.ply |--models_fine | | models_info.json | | obj_xxxx.ply | |------obj_xxxx.png |--test | |--000048~000059 | | | depth(Some PNG images in there, and the same goes for the folders below.) | | | mask(Some PNG images in there, and the same goes for the folders below.) | | | mask_visib(Some PNG images in there, and the same goes for the folders below.) | | | rgb(There are some PNG images in there, and the same goes for the folders below.) | | | scene_camera.json | | | scene_gt.json | | --------- | scene_gt_info.json |--train_pbr | |--000000~000049 | | | depth | | | mask | | | mask_visib | | | rgb | | | scene_camera.json | | | scene_gt.json | | --------- | scene_gt_info.json |--train_real | |--000000~000091 | | |--depth | | |--mask | | | mask_visib | | | rgb | | | scene_camera.json | | | scene_gt.json | | --------- | scene_gt_info.json |--train_synt | |--000000~000079 | | | depth | | | mask | | | mask_visib | | | rgb | | | scene_camera.json | | | scene_gt.json | | --------- | scene_gt_info.json |--ycbv_models(The code indicates that it cannot find the models_info.json file in the directory. ) | | models | | | ----- models_info.json (copy from folder ‘models’) | --- | models_info.json (copy from folder ’models‘)

fertiliz commented 1 month ago

contents

wenbowen123 commented 1 month ago

you need to have this file https://github.com/NVlabs/FoundationPose/blob/main/datareader.py#L451 you can download it from the PoseCNN website https://rse-lab.cs.washington.edu/projects/posecnn

ddz16 commented 1 month ago

Hello, thank you for your excellent contribution. Our teacher wants me to acquire real-time pose on Hikvision's industrial entry-level depth camera to integrate with robotic arm grasping. I have successfully run the demo, but I've encountered issues with the YCBV dataset directory when trying to run the model-free setup. Could you please inform me about the data and directory structure I should prepare for the model-free version on YCBV? Additionally, could you briefly explain how I can connect to the camera to infer my own objects using the model-free setup?

Hello, I currently have the same needs and have conducted some experiments. Do you have a contact method so we can communicate? You can find my e-mail in my personal page.

plusgrey commented 1 month ago

What's the error you run into when running the YCBV model-free setup?

For your own use case, you need to prepare some reference images just like the way our LINEMOD and YCBV example outlines. Specifically, you need RGBD, camera-object poses, object masks. For masks you can use tools such as SAM. For poses you can consider BundleSDF. Alternatively, you can also record a RGBD video for your novel object and use BundleSDF to directly get a 3D model to run the model-based setup, just like that github repo demo shows.

Hi Bowen,

If I am planning to use my own object (with a series of required images), do I need to retrain the geometry network and the appearance network? Since I think the implicit representation network (like NeRF and SDF) can only represent a single object.

fertiliz commented 1 month ago

Hello, thank you for your excellent contribution. Our teacher wants me to acquire real-time pose on Hikvision's industrial entry-level depth camera to integrate with robotic arm grasping. I have successfully run the demo, but I've encountered issues with the YCBV dataset directory when trying to run the model-free setup. Could you please inform me about the data and directory structure I should prepare for the model-free version on YCBV? Additionally, could you briefly explain how I can connect to the camera to infer my own objects using the model-free setup?

Hello, I currently have the same needs and have conducted some experiments. Do you have a contact method so we can communicate? You can find my e-mail in my personal page.

you need to have this file https://github.com/NVlabs/FoundationPose/blob/main/datareader.py#L451 you can download it from the PoseCNN website https://rse-lab.cs.washington.edu/projects/posecnn

Thanks!

fertiliz commented 1 month ago

Hello, thank you for your excellent contribution. Our teacher wants me to acquire real-time pose on Hikvision's industrial entry-level depth camera to integrate with robotic arm grasping. I have successfully run the demo, but I've encountered issues with the YCBV dataset directory when trying to run the model-free setup. Could you please inform me about the data and directory structure I should prepare for the model-free version on YCBV? Additionally, could you briefly explain how I can connect to the camera to infer my own objects using the model-free setup?

Hello, I currently have the same needs and have conducted some experiments. Do you have a contact method so we can communicate? You can find my e-mail in my personal page.

OK,I will contact you later!

wenbowen123 commented 1 month ago

yes you need to retrain the neural object field.

What's the error you run into when running the YCBV model-free setup? For your own use case, you need to prepare some reference images just like the way our LINEMOD and YCBV example outlines. Specifically, you need RGBD, camera-object poses, object masks. For masks you can use tools such as SAM. For poses you can consider BundleSDF. Alternatively, you can also record a RGBD video for your novel object and use BundleSDF to directly get a 3D model to run the model-based setup, just like that github repo demo shows.

Hi Bowen,

If I am planning to use my own object (with a series of required images), do I need to retrain the geometry network and the appearance network? Since I think the implicit representation network (like NeRF and SDF) can only represent a single object.