PaulDanielML / MuJoCo_RL_UR5

A MuJoCo/Gym environment for robot control using Reinforcement Learning. The task of agents in this environment is pixel-wise prediction of grasp success chances.
MIT License
413 stars 54 forks source link
computer-vision gym-environment mujoco pick-and-place reinforcement-learning robotics

Accompanying repository of Master's thesis at TU Berlin / Aalborg University. No longer under active development. Developed in my earlier Python days, please forgive the unformatted spaghetti code.

Deep Reinforcement Learning for robotic pick and place applications using purely visual observations

Author: Paul Daniel (paudan22@gmail.com)

Traits of this environment: Very large and multi-discrete actionspace, very high sample-cost, visual observations, binary reward.

Trained agent in action Example of predicted grasp chances
Setup iteration Relevant changes
IT5 - Many more objects, randomly piled
- Actionspace now multi-discrete, with second dimension being a rotation action
IT4 - Z-coordinate for grasping now calculated using depth data
- Objects now vary in size
IT3 - New two-finger gripper implemented
IT2 - Grasp success check now after moving to drop location (1000 steps)
IT1 (Baseline) - Grasp success check after moving straight up (500 steps of trying to close the gripper)
- Fixed z-coordinate for grasping
- Objects of equal size

This repository provides several python classes for control of robotic arms in MuJoCo:

The robot configuration used in this setup (Universal Robots UR5 + Robotiq S Model 3 Finger Gripper) is based on this resource. It has since been heavily modified. Most current XML-file: UR5gripper_2_finger.xml
The python bindings used come from mujoco_py.
The PID controllers implemented are based on simple_pid.
A simple inverse kinematics solver for translating end-effector positions into joint angles has been implemented using ikpy.

The required modules can be installed either manually or using the provided requirements.txt - file.

Setup

Download and install MuJoCo from here. Set up a license and activate it here.

Then clone this repo:

git clone https://github.com/PaulDanielML/MuJoCo_RL_UR5.git

Then change into the newly created directory:

cd MuJoCo_RL_UR5/

If desired, activate a virtual environment, then run

pip install -r requirements.txt

This will install all required packages using pip. The first time you run a script that uses the Mujoco_UR5_controller class some more setup might happen, which can take a few moments. This is all the setup required to use this repo.

Usage

GraspEnv - class:

Gym-environment for training agents to use RGB-D data for predicting pixel-wise grasp success chances.
The file example_agent.py demonstrates the use of a random agent for this environment.
The file Grasping_Agent.py gives an example of training a shortsighted DQN-agent in the environment to predict pixel-wise grasping success (PyTorch). The created environment has an associated controller object, which provides all the functionality of the MJ_Controller - class to it.

The user gets a summary of each step performed in the console. It is recommended to train agents without rendering, as this will speed up training significantly.

console

The rgb part of the last captured observation will be shown and updated in an extra window.

observation

MJ_Controller - class:

Example usage of some of the class methods is demonstrated in the file example.py.

The class MJ_Controller offers high and low level methods for controlling the robot in MuJoCo.

gif1

Updates

Trials for Offline RL: The folder Offline RL contains scripts for generating and learning from a dataset of (state, action, reward)-transitions. generate_data.py can be used to generate as many files as required, each file containing 12 transitions.

New gripper model available: A new, less bulky, 2-finger gripper was implemented in the model in training setup iteration 3.

new_gripper

Image normalization: Added script normalize.py, which samples 100 images from the environment and writes the mean values and standard deviations of all channels to a file.

Reset shuffle: Calling the environments step method now rearranges all the pickable objects to random positions on the table.

gif2

Record grasps: The step method of the GraspingEnv now has the optional parameter record_grasps. If set to True, it will capture a side camera image every time a grasp is made that is deemed successful by the environment. This allows for "quality control" of the grasps, without having to watch all the failed attempts. The captured images can also be useful for fine tuning grasping parameters.

grasp

Point clouds: The controller class was provided with new methods for image transformations.

cloud

Joint plots: All methods that move joints now have an optional plot parameter. If set to True, a .png-file will be created in the local directory. It will show plots for each joint involved in the trajectory, containing the joint angles over time, as well as the target values. This can be used to determine which joints overshoot, oscillate etc. and adjust the controller gains based on that.
The tolerance used for the trajectory are plotted in red, so it can easily be determined how many steps each of the joints needs to reach a value within tolerance.

plot1