OSUrobotics / KinovaGrasping

This contains the simulation of a kinova robot and the code for collecting data and training both a grasp classifier and a RL agent
25 stars 6 forks source link

Learning "near-contact" grasping strategy with Deep Reinforcement Learning

This is an implementation of Deep Deterministic Policy Gradient from Demonstration (DDPGfD) to train a policy to perform "near-contact" grasping tasks, where object's starting position is random within graspable region. We took one "near-contact" strategy from this paper as expert demonstration and train a RL controller to handle a variety of objects with random starting position.

This environment runs on MuJoCo with an intergration of OpenAI gym to facilitate the data collection and traning process.

Requirements: Pytorch 1.2.0 and Python 3.7

Installation

Mujoco v 1.50 (Note: the python package of this version works for python 3++)

Mujoco:

Mujoco-py

Python package developed by OpenAI

Do not try pip install mujoco-py. It will not work.

  1. Download the source code from here: https://github.com/openai/mujoco-py/releases/tag/1.50.1.0

  2. Untar / unzip the package

  3. cd mujoco-py-1.50.1.0

  4. pip install -e. Or pip install --user -e. (if you are denied for not having permission) Or pip3 (if your pip is python 2).

Now you can use mujoco with python by… import mujoco_py

After you have these installed, clone this repository. To check that it is working, run /KinovaGrasping/gym-kinova-gripper/teleop.py. This should show a render of the hand attempting to pick up a shape.

Instructions

There are seven experiments to run based on the order of training. The variables we modify are the shapes used, the sizes of these shapes used and the orientation of the hand during training. Six of the experiments change the order of training, (ie we modify one of the variables, then the next, then the last) and the last experiments only uses one stage of training (all variables changed at once).

At kinova_env_gripper.py, look at def randomize_all function. Change the arguments of self.experiment for different experiment number and stage number accordingly. For example, to run experiment 1 stage 1, At line 581, objects = self.experiment(1, 1) → the first number is experiment number while the second is stage number. Run the commands on terminal below for corresponding experiment.

Commands on terminal

For experiments with multiple commands, run each command and wait for it to complete before running the next command. The label describes what the network is training on for that command.

Exp1:

Alternate Use

This may also be used for purposes outside of these experiments, as it contains the kinova grasping environment, which is useful for grasp classification, regrasping, grasp training and many other uses. If you are planning on using this for purposes other than running the experiments described above, what follows is a brief explanation of what some of the key files do and their interaction.