How many epochs did you use for the real world insertion task with diffusion policy?

yolo01826 commented 3 months ago

Thank you for your amazing work!!!

If you don't mind, I'd like to ask how many epochs did you use for the real world insertion task with diffusion policy?

I encountered some problems during training the insertion task with my own robot setup. We collected 100 demostrations and trained for 200 epochs, the robot TCP position MSE_train_loss is nearly 0.001 (in unit meter). But when we deploy the policy, the robot just stopped at certain position and kept shaking. We have checked every observation and it's all good and we checked the dataset which is also good. Can you share some of your experience on training the policy? We would be so grateful!!

kevin-thankyou-lin commented 3 months ago

Thanks for the kind words!

For the real world insertion task, we used the checkpoint between 200~300 epochs for DDiM and the checkpoint at 300 epochs for CP (after loading from a 300 epoch EDM checkpoint). We found that training for longer typically worked better (or at least not worse than training for less epochs), at least for the experiments in sim.

We used ~150 demos for the insertion task, so perhaps more demos might help. At one point, our third person camera got moved, and when we deployed our policy on that setup, the policy didn't work at all. So, I'd double check things haven't changed too much between data collection and deployment.

Good luck!

yolo01826 commented 3 months ago

Dear Author,

@kevin-thankyou-lin Thank you very much for your response on GitHub. We are impressed by your consistency policy. However, we encountered a very strange phenomenon while using DDIM for the plug insertion task. Initially, the robotic arm can move normally from the starting point to above the plug, and the trajectory prediction is quite stable. But as it approaches the charger, the trajectory predicted by the model becomes completely chaotic, not resembling a simple curve at all. (We replicated your task setup, with inputs being two RGB images, translations, quaternion poses, and the state of the gripper opening and closing, and the outputs being the end-effector pose and gripper state, collecting 200 segments of demonstrations.) Could this issue be due to unclear visual features, or is it simply model overfitting? 776f355a-0bbe-4986-b8fe-5e95bd3c5654

Additionally, I have carefully read your paper, where it mentions that the robotic arm's rotation is represented in 6D. Could you explain this? Could this possibly be a reason for the poor performance? I am very much looking forward to your reply.🥺

Aaditya-Prasad commented 3 months ago

Is your test distribution the same as your training distribution? Does your training dataset cover this distribution well? If yes to both, 'overfitting' is not a bad thing. I would simplify the task (if needed) to the point that the answer to both of those is yes and then overfit a lot (train a large enough model for a long enough time) as a sanity check. There's nothing obvious in your task description that makes me think it should be unlearnable.

It looks like you only have 120 epochs and your val loss is just barely trending up; I think you just should start by just training for longer, especially if you're confident that your data is good. Diffusion Models are also often benefited by training even after train loss converges.

We took 6d rotation representation for learning from Diffusion Policy and never had any issues with it.

Please let us know if you have more questions.

Aaditya-Prasad / consistency-policy

How many epochs did you use for the real world insertion task with diffusion policy? #2