Closed buzzo123 closed 6 years ago
this algorithm doesn't work! I leave running for all night but nothing append....
I'll check the train_kuka_grasping.py script, we mainly train KUKA using the continuous version using PPO using TensorFlow (TF) Agents, and haven't tried the discrete version (DQN) recently.
In the meanwhile, did you try training the cartpole and racecar with DQN? python -m pybullet_envs.baselines.train_pybullet_cartpole python -m pybullet_envs.baselines.train_pybullet_racecar
Also, you may want to try the TF Agents training with the locomotion tasks ( pybullet_pendulum, pybullet_doublependulum, pybullet_pendulumswingup, pybullet_cheetah, pybullet_ant ,pybullet_racecar, pybullet_minitaur) See also the Reinforcement Learning section of http://pybullet.org.
I'm interested in the kuka grasping algorithms, I found this py file: bullet3/examples/pybullet/gym/pybullet_envs/agents/train_ppo.py I have to use this? can you explain me the usage? thanks
The same issue happens to me too. Kuka grasp training using DQN is not converging. It would be greatly helpful if an example on training using PPO or atleast some hints on using PPO is provided.
I'm working on fixing it and providing continuous action/PPO support. Will report here when it is done.
Make sure to upgrade to pybullet 1.6.3 (pip install -U pybullet
):
I just uploaded a new version of the Kuka grasping environment, both discrete and continuous.
Let's first try to get the kukaGymEnv to train properly (cheat by providing object positions), then later we look into kukaCamGymEnv (from camera pixels)
Using the latest pybullet, you can run the environment manually using
python3 -m pybullet_envs.examples.kukaGymEnvTest
(this lets you control a few more action settings, it calls the gym 'step2' API.
and
python3 -m pybullet_envs.examples.kukaGymEnvTest2
(this calls the actual gym step API)
Training usine TensorFlow Agents PPO:
//install pybullet etc
pip install agents, gym, tensorflow, pybullet
//train
python3 -m pybullet_envs.agents.train_ppo --config=pybullet_kuka_grasping --logdir=kuka
tensorboard --logdir=kuka/<timestamp> (and open a browser, point to localhost:6006 or other port, if using --port=xyz argument for tensorboard)
//evaluate
python3 -m pybullet_envs.agents.visualize_ppo --logdir=kuka/<timestampname> --outdir=kuka_video1
This Kuka grasping environment has a very sparse reward only at the end of the episode, so we may need to use learning from demonstration (VR), curriculum learning or GraspGAN (like my colleagues did).
instead of using kukagymEnv-v0, can I train my personal env using PPO?
In what format is your personal env? Is it MuJoCo or did you write some pybullet code?
On Thu, Nov 2, 2017 at 8:12 AM buzzo123 notifications@github.com wrote:
instead of using kukagymEnv-v0, can I train my personal env using ppo?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bulletphysics/bullet3/issues/1386#issuecomment-341335915, or mute the thread https://github.com/notifications/unsubscribe-auth/AC97qyiQymjCVcEsPpLnX2vTvoxNPXMQks5syWtLgaJpZM4P5vc6 .
its in the same format used in kukaGymEnvTest
You should be able to train many pybullet environments without problem using PPO or DQN, they use the Gym interface (reset, step). I flagged the issue with KUKA in the pybullet quickstart guide.
Here are pybullet environments that train well using DQN:
python -m pybullet_envs.baselines.train_pybullet_cartpole python -m pybullet_envs.baselines.train_pybullet_racecar python -m pybullet_envs.baselines.enjoy_pybullet_cartpole python -m pybullet_envs.baselines.enjoy_pybullet_racecar
Using PPO: python -m pybullet_envs.agents.train_ppo --config=pybullet_pendulum --logdir=pendulum
The following environments are available as Agents config: pybullet_pendulum pybullet_doublependulum pybullet_pendulumswingup pybullet_cheetah pybullet_ant pybullet_racecar pybullet_minitaur
See also the Reinforcement Learning section of http://pybullet.org
We cannot provide support for training personal environments here, in particular when you don't share them on github.
I think it would be good for future contributions if we give some support for training personal environments (Maybe not here but on the Bullet Physics forum). If the environment is written in pybullet and uses the Gym interface, there shouldn't be a big problem to train it. At least the contributions in my gymperium branch are allowing this.
On Thu, Nov 2, 2017 at 3:55 PM erwincoumans notifications@github.com wrote:
You should be able to train many pybullet environments without problem using PPO or DQN, they use the Gym interface (reset, step). I flagged it in the pybullet quickstart guide.
Here are pybullet environments that train well using DQN:
python -m pybullet_envs.baselines.train_pybullet_cartpole python -m pybullet_envs.baselines.train_pybullet_racecar
python -m pybullet_envs.baselines.enjoy_pybullet_cartpole python -m pybullet_envs.baselines.enjoy_pybullet_racecar
Using PPO: python -m pybullet_envs.agents.train_ppo --config=pybullet_pendulum --logdir=pendulum
The following environments are available as Agents config: pybullet_pendulum pybullet_doublependulum pybullet_pendulumswingup pybullet_cheetah pybullet_ant pybullet_racecar pybullet_minitaur
We cannot provide support for training personal environments here, in particular when you don't point to them on github.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/bulletphysics/bullet3/issues/1386#issuecomment-341447738, or mute the thread https://github.com/notifications/unsubscribe-auth/AC97q4YUwQK17jeMU-QW6HHQHMq7Gl8hks5sydf8gaJpZM4P5vc6 .
ok, thanks for the answers! I will try in the next few days
Actually, hardmaru could train this kuka grasping environment with his Evolution Strategies algorithms: https://twitter.com/hardmaru/status/926316139071799301 Hopefully his code will be released soon.
Thanks for the replies and continued updates. I will get back soon after trying.
I have tried:
python3 -m pybullet_envs.agents.train_ppo --config=pybullet_kuka_grasping --logdir=kuka
Then i evaluate the environment:
python3 -m pybullet_envs.agents.visualize_ppo --logdir=kuka/
However, from the "kuka_video1", we find it the robot can still not grasp the object.
The KUKA grasping environment has a very sparse reward, and the TF Agents PPO may not find a suitable policy. Using Evolution Strategies works OK, see http://blog.otoro.net/2017/11/12/evolving-stable-strategies/ The code is released, including a pretrained model for KUKA grasping: https://github.com/hardmaru/estool/
If you want to use PPO, you may need to extend it with a more fancy exploration strategy or other ideas (create some curriculum, starting with states closer to a successful grasp, and gradually make the trajectories longer, starting further away)
Just to confirm, is the discrete version of KUKA working correctly now? In other words, is it possible to be trained with any algorithm, no matter if ES or RL (possibly assuming some advanced exploration for RL, demonstrations or curriculum)? Was not sure because it is still flagged in the documentation. Thanks!
I haven't tried solving the discrete version of the KUKA grasping (kukaGymEnv), the continuous version is solved with ES. Note that there is another newer environment, KukaDiverseObjectEnv that is more interesting. Again, I think it was only tested with continuous actions.
Please let us know if you experiment with the discrete version.
Thanks, I will try and report results!
train_kuka_cam_grasping.py seems to fail to converge as well. Will there be update anytime soon?
I would recommend looking into this implementation: https://github.com/google-research/google-research/tree/master/dql_grasping It is derived from this Kuka grasping prototype. (https://twitter.com/ericjang11/status/1083805919698276352)
Thanks. That is really helpful. By taking a look at the environment proposed by Google research, I find there is not much difference between that environment and the original one, only some difference in the parameters such as the allowed range of actual end effector positions. Also, it calls stepSimulation() 200 times after applying action instead of doing stepSimulation() only once. As for the reward, it uses binary reward only, which seems to be weaker. Maybe these small details really count.
Maybe these small details really count.
Yes, small details matter a lot. Unfortunately, such small details are not mentioned in papers.
Would it be possible to train the diverse kuka environment with OpenAI's implementation of HER as given in Baselines?
I have a working A2C for kuka grasping. Please contact me if interested. culurciello@gmail.com
Also have imitation learning for kuka - am looking for collaborators to build a decent curriculum please contact me culurciello@gmail.com
how much time I need to wait until the train_kuka_grasping.py in bullet3/examples/pybullet/gym/pybullet_envs/baselines it's complete?