MarkFzp / act-plus-plus

Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN
https://mobile-aloha.github.io/
MIT License
2.86k stars 525 forks source link

Questions about inference performance on real robot #52

Open obito8065 opened 1 month ago

obito8065 commented 1 month ago
  Thanks for your contribution and work.
  We are facing some performance issues when we use our real robot (which uses exactly the same robotic arms for puppet and master as the paper) for model evaluation.
  1. We trained the "sim_cube_transfer" task in a virtual environment, achieving a success rate of over 60%. However, when evaluating the trained model on our real robot, the performance was poor. Specifically, the robot would always start with a jerk during each evaluation and then remain almost stationary, failing to complete subsequent actions. We discovered that this was likely due to the differences between the visual input from the real and virtual worlds. The robot functioned well when using visual input from the virtual environment instead of the real world's visual input. This suggests that, in my opinion, the proposed method may not be as robust as anticipated. I was wondering if you have encountered this issue before? If so, could you please share any advice on sim-to-real transfer?

  2. We also introduced a simple new task by manually recording 50 episodes with our real robot. Specifically, the new task involves letting the left arm grasp a straw that is attached to the right arm. However, after 500,000 steps of training, when we evaluated the model on the real robot, the performance still did not meet our expectations. The grippers failed to close completely when attempting to grasp the straw, even though they showed a tendency to close. We replayed the recorded episodes and confirmed that the gripper indeed closes completely when grasping the straw.( the upload videos)

    I'm wondering what might be the possible reasons for the poor performance on the real robot task. Could you please provide any advice on tuning real robot tasks? Additionally, is there anything noteworthy for recording and training real-world tasks? Thanks!

https://github.com/MarkFzp/act-plus-plus/assets/143476448/f2f8b471-552c-4bee-8f9a-1f8d56d4ebb0