I really like your work and have a few questions I'd like to ask you.
Firstly, if the keypoint tracking method is trained on a single task with 100 human demonstrations and 10 robot teleoperations, can it yield a well-performing tracking model?
Does the keypoint information passed into the policy consist of its pixel coordinates, or does it include some form of rotation?
The paper mentions using 4 A100 GPUs for training. Is it possible to train a good single-task policy using a 4090 GPU instead?
Hey! I want to follow on questions about training and deployment in real.
Is there any example of the data required for providing real-world demonstrations with a robot?
Will you provide code for the real-world experiments with the UR5?
How are you sending the commands with the UR5, is this ROS, RTDE with movep and movej, or something different?
Related to 3, Could you explain the pipeline: ATM tracker & policy + send commands to robots
Could you also specify more in detail what is needed for doing this with real hardware: what inputs for doing this with real hardware, requirements in camera viewpoints, any required transformations of observations to the policy, calibrations, limits of robot/human demonstration velocities, GPUs required for doing inference, etc.
Dear Author,
I really like your work and have a few questions I'd like to ask you.
Thank you.