Hello, thank you very much for the work you have done, the results are very impressive. I have some questions I would like to ask:
1、In the DP model learning paper, the authors mention that the combination of images and states performs better than using states alone. Why didn't you use images during the imitation policy training phase? Was there a specific reason for this?
2、During policy distillation, you distilled a state-based policy into an image-based policy. What was the reason behind this? Your paper suggests that the performance of state-based and image-based approaches is similar, so what led to the change in your policy approach?
Would including some real-world data in the training of the imitation policy and residual policy improve the results? Have you conducted any experiments to test this?
Thank you very much for your answers, and thank you again!
Which paper do refer to, the Chi, Cheng, et al. "Diffusion policy: Visuomotor policy learning via action diffusion." one? I don't remember reading that specific statement. But, in our case, we used states only because to make the RL training of these long-horizon, sparse-reward tasks tractable, we need the simulation to run really fast—which is not really feasible if we have to render images in addition to running the physics.
After RL training in lowdim state-space, we want to obtain a policy that we can deploy on the physical hardware. As we don't have access to the ground-truth state of the objects in the scene, we opt to create a policy that operates directly from the image observations, which is why we need to do the distillation.
We included real data only in the distillation stage of the pipeline. I think adding more diverse data earlier in the pipeline can help as well as it can provide some regularization or better generalization, but we didn't try this in this work.
Hello, thank you very much for the work you have done, the results are very impressive. I have some questions I would like to ask:
1、In the DP model learning paper, the authors mention that the combination of images and states performs better than using states alone. Why didn't you use images during the imitation policy training phase? Was there a specific reason for this?
2、During policy distillation, you distilled a state-based policy into an image-based policy. What was the reason behind this? Your paper suggests that the performance of state-based and image-based approaches is similar, so what led to the change in your policy approach?
Would including some real-world data in the training of the imitation policy and residual policy improve the results? Have you conducted any experiments to test this?
Thank you very much for your answers, and thank you again!