NVlabs / M2T2

M2T2: Multi-Task Masked Transformer for Object-centric Pick and Plac
Other
27 stars 2 forks source link

Real robot implemtation #2

Open JeonHoKang opened 2 months ago

JeonHoKang commented 2 months ago

If I wanted to run this for real robot with my realsense camera, would I be able to train my own m2t2? or will I be able to provide additional data to existing one and fine-tune the model? If so what are the steps in doing so?

adithyamurali commented 2 months ago

Hi @JeonHoKang the model can be run on the real robot for a tabletop pick and place setting. If you check the Readme, there is an example of running the model on some pointcloud data from a realsense camera (saved offline):

python demo.py eval.checkpoint=m2t2.pth eval.data_dir=sample_data/real_world/00 eval.mask_thresh=0.4 eval.num_runs=5
JeonHoKang commented 2 months ago

Hi Adithya. Good to hear from you. I am a big fan of your work. By the way, I am trying to test out your architecture in our lab and I was hoping that you could provide me with more details about how to generate data.

It looks like you created synthetic data in isaac sim environment.

Depth data looks like it's 512,512.

When I train my data and validate it on my pointcloud should that structure be kept at all times?

Also, how should I generate annotations and grasps? Looks like annotation is in the form of pointcloud in pkl files. Not sure what meta_data.pkl file is though.

Then what tool do you use to generate gripper looking schematic in the meshcat or in the figures of your paper to show the decoded grasps?

Thank you very much in advance.

JeonHoKang commented 1 month ago

Any updates with the above issue? I have generated depth.npy and rgb image from isaacsim with isaaclab plugin. However, I am still unsure what tools to use for annotation.pkl and meta_data.pkl. Thank you!

adithyamurali commented 1 month ago

Hi @JeonHoKang our grasp annotations are from ACRONYM https://github.com/NVlabs/acronym Note that acronym consists only object meshes + grasps. You'll have to construct a scene with these objects - e.g. place the object on a table or a box primitive constructed with the trimesh library

For rendering depth, you can also consider using https://github.com/owl-project/NVISII

While the model is trained with purely synthetic data, you can test it on either in sim or on the real robot

Yes, we used meshcat to create visualizations in the paper