Open sheffier opened 2 years ago
Hello,
I wasn't able to reproduce results for the following tasks using PPO:
- ShadowHandCatchAbreast - The policy seems to learn to perform the task. However, the resulting rewards are much lower
- Tasks ShadowHandGraspAndPlace and ShadowHandKettle - None of the saved checkpoints resulted in a policy that can perform the task
Thanks
Hello @sheffier ,
Thank you for your interest in our work. I'm so sorry for not updating in time because I've been so busy lately. I will gradually update the dataset later.
For 1), I think we can first look at the parameter settings, especially the number of environments. We are using 2048 environments for parallel sampling, which is very important for our task -- high sampling efficiency is the basis for our high action-dimensional control task. Other settings can refer to our paper.
For 2), it has to be said that for some tasks we release, we also can not completely train successfully with PPO alone in some tasks like ShadowHandGraspAndPlace and ShadowHandKettle, only some effects can be achieved on the above like our demo. These environments that can not simply be trained with PPO are still challenges, hoping this study to provide a new platform for the community of RL and robotics.
Hope this can help you.
Thanks, @cypypccpy, very helpful
Hello,
I wasn't able to reproduce results for the following tasks using PPO:
Thanks