how much time for train kuka grasping?

buzzo123 commented 6 years ago

how much time I need to wait until the train_kuka_grasping.py in bullet3/examples/pybullet/gym/pybullet_envs/baselines it's complete?

buzzo123 commented 6 years ago

this algorithm doesn't work! I leave running for all night but nothing append....

erwincoumans commented 6 years ago

I'll check the train_kuka_grasping.py script, we mainly train KUKA using the continuous version using PPO using TensorFlow (TF) Agents, and haven't tried the discrete version (DQN) recently.

In the meanwhile, did you try training the cartpole and racecar with DQN? python -m pybullet_envs.baselines.train_pybullet_cartpole python -m pybullet_envs.baselines.train_pybullet_racecar

Also, you may want to try the TF Agents training with the locomotion tasks ( pybullet_pendulum, pybullet_doublependulum, pybullet_pendulumswingup, pybullet_cheetah, pybullet_ant ,pybullet_racecar, pybullet_minitaur) See also the Reinforcement Learning section of http://pybullet.org.

buzzo123 commented 6 years ago

I'm interested in the kuka grasping algorithms, I found this py file: bullet3/examples/pybullet/gym/pybullet_envs/agents/train_ppo.py I have to use this? can you explain me the usage? thanks

achuwilson commented 6 years ago

The same issue happens to me too. Kuka grasp training using DQN is not converging. It would be greatly helpful if an example on training using PPO or atleast some hints on using PPO is provided.

erwincoumans commented 6 years ago

I'm working on fixing it and providing continuous action/PPO support. Will report here when it is done.

erwincoumans commented 6 years ago

Make sure to upgrade to pybullet 1.6.3 (pip install -U pybullet): I just uploaded a new version of the Kuka grasping environment, both discrete and continuous. Let's first try to get the kukaGymEnv to train properly (cheat by providing object positions), then later we look into kukaCamGymEnv (from camera pixels) Using the latest pybullet, you can run the environment manually using

python3 -m pybullet_envs.examples.kukaGymEnvTest(this lets you control a few more action settings, it calls the gym 'step2' API. and python3 -m pybullet_envs.examples.kukaGymEnvTest2 (this calls the actual gym step API)

Training usine TensorFlow Agents PPO:

//install pybullet etc
pip install agents, gym, tensorflow, pybullet
//train
python3 -m pybullet_envs.agents.train_ppo --config=pybullet_kuka_grasping --logdir=kuka
tensorboard --logdir=kuka/<timestamp> (and open a browser, point to localhost:6006 or other port, if using --port=xyz argument for tensorboard)
//evaluate
python3 -m pybullet_envs.agents.visualize_ppo  --logdir=kuka/<timestampname> --outdir=kuka_video1

This Kuka grasping environment has a very sparse reward only at the end of the episode, so we may need to use learning from demonstration (VR), curriculum learning or GraspGAN (like my colleagues did).

buzzo123 commented 6 years ago

instead of using kukagymEnv-v0, can I train my personal env using PPO?

benelot commented 6 years ago

In what format is your personal env? Is it MuJoCo or did you write some pybullet code?

On Thu, Nov 2, 2017 at 8:12 AM buzzo123 notifications@github.com wrote:

instead of using kukagymEnv-v0, can I train my personal env using ppo?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bulletphysics/bullet3/issues/1386#issuecomment-341335915, or mute the thread https://github.com/notifications/unsubscribe-auth/AC97qyiQymjCVcEsPpLnX2vTvoxNPXMQks5syWtLgaJpZM4P5vc6 .

buzzo123 commented 6 years ago

its in the same format used in kukaGymEnvTest

erwincoumans commented 6 years ago

You should be able to train many pybullet environments without problem using PPO or DQN, they use the Gym interface (reset, step). I flagged the issue with KUKA in the pybullet quickstart guide.

Here are pybullet environments that train well using DQN:

python -m pybullet_envs.baselines.train_pybullet_cartpole python -m pybullet_envs.baselines.train_pybullet_racecar python -m pybullet_envs.baselines.enjoy_pybullet_cartpole python -m pybullet_envs.baselines.enjoy_pybullet_racecar

Using PPO: python -m pybullet_envs.agents.train_ppo --config=pybullet_pendulum --logdir=pendulum

The following environments are available as Agents config: pybullet_pendulum pybullet_doublependulum pybullet_pendulumswingup pybullet_cheetah pybullet_ant pybullet_racecar pybullet_minitaur

See also the Reinforcement Learning section of http://pybullet.org

We cannot provide support for training personal environments here, in particular when you don't share them on github.

benelot commented 6 years ago

I think it would be good for future contributions if we give some support for training personal environments (Maybe not here but on the Bullet Physics forum). If the environment is written in pybullet and uses the Gym interface, there shouldn't be a big problem to train it. At least the contributions in my gymperium branch are allowing this.

On Thu, Nov 2, 2017 at 3:55 PM erwincoumans notifications@github.com wrote:

You should be able to train many pybullet environments without problem using PPO or DQN, they use the Gym interface (reset, step). I flagged it in the pybullet quickstart guide.

Here are pybullet environments that train well using DQN:

python -m pybullet_envs.baselines.train_pybullet_cartpole python -m pybullet_envs.baselines.train_pybullet_racecar

python -m pybullet_envs.baselines.enjoy_pybullet_cartpole python -m pybullet_envs.baselines.enjoy_pybullet_racecar

Using PPO: python -m pybullet_envs.agents.train_ppo --config=pybullet_pendulum --logdir=pendulum

The following environments are available as Agents config: pybullet_pendulum pybullet_doublependulum pybullet_pendulumswingup pybullet_cheetah pybullet_ant pybullet_racecar pybullet_minitaur

We cannot provide support for training personal environments here, in particular when you don't point to them on github.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/bulletphysics/bullet3/issues/1386#issuecomment-341447738, or mute the thread https://github.com/notifications/unsubscribe-auth/AC97q4YUwQK17jeMU-QW6HHQHMq7Gl8hks5sydf8gaJpZM4P5vc6 .

buzzo123 commented 6 years ago

ok, thanks for the answers! I will try in the next few days

erwincoumans commented 6 years ago

Actually, hardmaru could train this kuka grasping environment with his Evolution Strategies algorithms: https://twitter.com/hardmaru/status/926316139071799301 Hopefully his code will be released soon.

achuwilson commented 6 years ago

Thanks for the replies and continued updates. I will get back soon after trying.

pyni commented 6 years ago

I have tried: python3 -m pybullet_envs.agents.train_ppo --config=pybullet_kuka_grasping --logdir=kuka Then i evaluate the environment: python3 -m pybullet_envs.agents.visualize_ppo --logdir=kuka/ --outdir=kuka_video1

However, from the "kuka_video1", we find it the robot can still not grasp the object.

erwincoumans commented 6 years ago

The KUKA grasping environment has a very sparse reward, and the TF Agents PPO may not find a suitable policy. Using Evolution Strategies works OK, see http://blog.otoro.net/2017/11/12/evolving-stable-strategies/ The code is released, including a pretrained model for KUKA grasping: https://github.com/hardmaru/estool/

If you want to use PPO, you may need to extend it with a more fancy exploration strategy or other ideas (create some curriculum, starting with states closer to a successful grasp, and gradually make the trajectories longer, starting further away)

nikonikolov commented 5 years ago

Just to confirm, is the discrete version of KUKA working correctly now? In other words, is it possible to be trained with any algorithm, no matter if ES or RL (possibly assuming some advanced exploration for RL, demonstrations or curriculum)? Was not sure because it is still flagged in the documentation. Thanks!

erwincoumans commented 5 years ago

I haven't tried solving the discrete version of the KUKA grasping (kukaGymEnv), the continuous version is solved with ES. Note that there is another newer environment, KukaDiverseObjectEnv that is more interesting. Again, I think it was only tested with continuous actions.

Please let us know if you experiment with the discrete version.

nikonikolov commented 5 years ago

Thanks, I will try and report results!

AZdet commented 5 years ago

train_kuka_cam_grasping.py seems to fail to converge as well. Will there be update anytime soon?

erwincoumans commented 5 years ago

I would recommend looking into this implementation: https://github.com/google-research/google-research/tree/master/dql_grasping It is derived from this Kuka grasping prototype. (https://twitter.com/ericjang11/status/1083805919698276352)

AZdet commented 5 years ago

Thanks. That is really helpful. By taking a look at the environment proposed by Google research, I find there is not much difference between that environment and the original one, only some difference in the parameters such as the allowed range of actual end effector positions. Also, it calls stepSimulation() 200 times after applying action instead of doing stepSimulation() only once. As for the reward, it uses binary reward only, which seems to be weaker. Maybe these small details really count.

erwincoumans commented 5 years ago

Maybe these small details really count.

Yes, small details matter a lot. Unfortunately, such small details are not mentioned in papers.

psxz commented 5 years ago

Would it be possible to train the diverse kuka environment with OpenAI's implementation of HER as given in Baselines?

culurciello commented 4 years ago

I have a working A2C for kuka grasping. Please contact me if interested. culurciello@gmail.com

culurciello commented 4 years ago

Also have imitation learning for kuka - am looking for collaborators to build a decent curriculum please contact me culurciello@gmail.com

bulletphysics / bullet3

how much time for train kuka grasping? #1386