Closed doublelei closed 1 year ago
Hi,
Thanks for the interest on our work.
For collecting the demo, the human demonstrator kinesthetically moved the robot arm to specify a sequence of gripper target poses for a given task and scene configuration. The robot was then reset to its initial pose and moved sequentially to each target pose in the specified order while recording the RGB-D stream from the camera.
Regarding the question about "how the rvt output action was projected to the robot world frame?" We know RVT's predicted action in the input point cloud space. This is same as the pred_wpt.
The point cloud, which is given as input to RVT, is the perceived point clouds to the robot base frame. That is in this frame, the origin is at the robot base. Therefore, we know RVT's perdicted action in the robot base frame.
The released model code should be sufficient to train on the real-world data.
Please let us know if you have any specific questions.
Closing because of inactivity. Please feel free to reopen if the issue persists.
Hi,
I appreciate all the hard work being done on this project! I am currently trying to train the network on a real Franka Panda arm and am hoping for some guidance. I have the following questions that I'd appreciate if you could help answer:
Could you please elucidate on the data collection process that was followed?
I'd like to understand more about the eye-hand calibration. Specifically, I'm curious to know how the rvt output action was projected to the robot world frame?
I'm also looking for any specific suggestions or best practices that I should keep in mind when training on real-world data?
Thank you in advance for your support!