AndrejOrsula / drl_grasping

Deep Reinforcement Learning for Robotic Grasping from Octrees
https://arxiv.org/pdf/2208.00818
BSD 3-Clause "New" or "Revised" License
404 stars 54 forks source link
deep-reinforcement-learning domain-randomization gazebo grasping gym-ignition octree openai-gym reinforcement-learning robotics ros ros2 sim2real stable-baselines3

Deep Reinforcement Learning for Robotic Grasping from Octrees

This project focuses on applying deep reinforcement learning to acquire a robust policy that allows robots to grasp diverse objects from compact 3D observations in the form of octrees.

Evaluation of a trained policy on novel scenes (previously unseen camera poses, objects, terrain textures, ...).

Sim-to-Real transfer of a policy trained solely inside a simulation (zero-shot transfer). Credit: Aalborg University

Evaluation of a trained policy for grasping rocks on the Moon inside a simulation.

Sim-to-Real transfer in a Moon-analogue facility (zero-shot transfer). Credit: University of Luxembourg

Overview


This repository contains multiple RL environments for robotic manipulation, focusing on robotic grasping using continuous actions in Cartesian space. All environments have several observation variants that enable direct comparison (RGB images, depth maps, octrees, ...). Each task is coupled with a simulation environment that can be used to train RL agents. These agents can subsequently be evaluated on real robots that integrate ros2_control (or ros_control via ros1_bridge).

End-to-end model-free actor-critic algorithms have been tested on these environments (TD3, SAC and TQC | SB3 PyTorch implementation). A setup for experimenting with model-based algorithm (DreamerV2 | original TensorFlow implementation) is also provided, however, it is currently limited to RGB image observations. Interoperability of environments with most algorithms and their implementations should be possible due to compatibility with the Gym API.

List of Environments Below is the list of implemented environments. Each environment (observation variant) has two alternatives, `Task-Obs-vX` and `Task-Obs-Gazebo-vX` (omitted from the table). Here, `Task-Obs-vX` implements the logic of the environment and can be used on real robots, whereas `Task-Obs-Gazebo-vX` combines this logic with the simulation environment inside Gazebo. Robots should be interchangeable for most parts, with some limitations (e.g. `GraspPlanetary` task requires a mobile manipulator to randomize the environment fully). If you are interested in configuring these environments, first take a look at the list of their parameters inside [Gym registration](./drl_grasping/envs/__init__.py) and then at their individual source code.
Reach the end-effector goal. Grasp and lift a random object. Grasp and lift a Moon rock.
Reach-v0 (state obs) Grasp-v0 (state obs) GraspPlanetary-v0 (state obs)
GraspPlanetary-MonoImage-v0
Reach-ColorImage-v0 GraspPlanetary-ColorImage-v0
Reach-DepthImage-v0 GraspPlanetary-DepthImage-v0
GraspPlanetary-DepthImageWithIntensity-v0
GraspPlanetary-DepthImageWithColor-v0
Reach-Octree-v0 Grasp-Octree-v0 GraspPlanetary-Octree-v0
Reach-OctreeWithIntensity-v0 Grasp-OctreeWithIntensity-v0 GraspPlanetary-OctreeWithIntensity-v0
Reach-OctreeWithColor-v0 Grasp-OctreeWithColor-v0 GraspPlanetary-OctreeWithColor-v0
By default, `Grasp` and `GraspPlanetary` tasks utilize [`GraspCurriculum`](./drl_grasping/envs/tasks/curriculums/grasp.py) that shapes their reward function and environment difficulty.
Domain Randomization To facilitate the sim-to-real transfer of trained agents, simulation environments introduce domain randomization with the aim of improving the generalization of learned policies. This randomization is accomplished via [`ManipulationGazeboEnvRandomizer`](./drl_grasping/envs/randomizers/manipulation.py) that populates the virtual world and enables randomizing of several properties at each reset of the environment. As this randomizer is configurable with numerous parameters, please take a look at the source code to see what environments you can create.

Examples of domain randomization for the Grasp task.

Examples of domain randomization for the GraspPlanetary task.

#### Model Datasets Simulation environments in this repository can utilize datasets of any [SDF](http://sdformat.org) models, e.g. models from [Fuel](https://app.gazebosim.org). By default, the `Grasp` task uses [Google Scanned Objects collection](https://app.gazebosim.org/GoogleResearch/fuel/collections/Scanned%20Objects%20by%20Google%20Research) together with a set of PBR textures pointed to by `TEXTURE_DIRS` environment variable. On the contrary, the `GraspPlanetary` task employs custom models that are procedurally generated via [Blender](https://blender.org). However, this can be adjusted if desired. All external models can be automatically configured and randomized in several ways via [`ModelCollectionRandomizer`](./drl_grasping/envs/models/utils/model_collection_randomizer.py) before their insertion into the world, e.g. optimization of collision geometry, estimation of (randomized) inertial properties and randomization of parameters such as geometry scale or surface friction. When processing large collections, model filtering can also be enabled based on several aspects, such as the complexity of the geometry or the existence of disconnected components. A few scripts for managing datasets can be found under [scripts/utils/](./scripts/utils/) directory.
End-to-End Learning from 3D Octree Observations This project initially investigated how 3D visual observations can be leveraged to improve end-to-end learning of manipulation skills. Octrees were selected for this purpose due to their efficiently organized structure compared to other 3D representations. To enable the extraction of abstract features from 3D octree observations, an octree-based 3D CNN is employed. The network module that accomplishes such feature extraction is implemented in the form of [`OctreeCnnFeaturesExtractor`](./drl_grasping/drl_octree/features_extractor/octree_cnn.py) (PyTorch). This features extractor is part of the `OctreeCnnPolicy` policy implemented for TD3, SAC and TQC algorithms. Internally, the feature extractor utilizes [O-CNN](https://github.com/microsoft/O-CNN) implementation to benefit from hardware acceleration on NVIDIA GPUs.

Illustration of the end-to-end actor-critic network architecture with octree-based 3D CNN feature extractor.

Limitations The known limitations of this repository are listed below for your convenience. - **No parallel environments –** It is currently not possible to run multiple instances of the environment simultaneously. - **Slow training –** The simulation environments are computationally complex (physics, rendering, underlying low-level control, ...). This significantly impacts the ability to train agents with time and computational constraints. The performance of some of these aspects can be improved at the cost of accuracy and realism (e.g. `physics_rate`/`step_size`). - **Suboptimal hyperparameters –** Although a hyperparameter optimization framework was employed for some combinations of environments and algorithms, it is a prolonged process. This problem is exacerbated by the vast quantity of hyperparameters and their general brittleness. Therefore, the default hyperparameters provided in this repository might not be optimal. - **Nondeterministic –** Experiments are not fully repeatable, and even the same seed of the pseudorandom generator can lead to different results. This is caused by several aspects, such as the nondeterministic nature of network-based communication and non-determinism in the underlying deep learning frameworks and hardware.

Instructions

Setup-wise, there are two options when using this repository. Option A – Docker is recommended when trying this repository due to its simplicity. Otherwise, Option B – Local Installation can be used if a local setup is preferred. Both of these options are equal for the usage of this repository; however, pre-built Docker images come with all the required datasets while enabling isolation of runs.

Option A – Docker ### Hardware Requirements - **CUDA GPU –** CUDA-enabled GPU is required for hardware-accelerated processing of octree observations. Everything else should also be functional on the CPU. ### Install Docker First, ensure your system has a setup for using Docker with NVIDIA GPUs. You can follow [`install_docker_with_nvidia.bash`](./.docker/host/install_docker_with_nvidia.bash) installation script for Debian-based distributions. Alternatively, consult the [NVIDIA Container Toolkit Installation Guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) for other Linux distributions. ```bash # Execute script inside a cloned repository .docker/host/install_docker_with_nvidia.bash # (Alternative) Execute script from URL bash -c "$(wget -qO - https://raw.githubusercontent.com/AndrejOrsula/drl_grasping/master/.docker/host/install_docker_with_nvidia.bash)" ``` ### Clone a Prebuilt Docker Image Prebuilt Docker images of `drl_grasping` can be pulled directly from [Docker Hub](https://hub.docker.com/repository/docker/andrejorsula/drl_grasping) without needing to build them locally. You can use the following command to manually pull the latest image or one of the previous tagged [Releases](https://github.com/AndrejOrsula/drl_grasping/releases). The average size of images is 25GB (including datasets). ```bash docker pull andrejorsula/drl_grasping:${TAG:-latest} ``` ### (Optional) Build a New Image It is also possible to build the Docker image locally using the included [Dockerfile](./Dockerfile). To do this, [`build.bash`](./.docker/build.bash) script can be executed as shown below (arguments are optional). This script will always print the corresponding low-level `docker build ...` command for your reference. ```bash .docker/build.bash ${TAG:-latest} ${BUILD_ARGS} ``` ### Run a Docker Container For simplicity, please run `drl_grasping` Docker containers using the included [`run.bash`](./.docker/run.bash) script shown below (arguments are optional). It enables NVIDIA GPUs and GUI interface while automatically mounting the necessary volumes (e.g. persistent logging) and setting environment variables (e.g. synchronization of middleware communication with the host). This script will always print the corresponding low-level `docker run ...` command for your reference. ```bash # Execute script inside a cloned repository .docker/run.bash ${TAG:-latest} ${CMD} # (Alternative) Execute script from URL bash -c "$(wget -qO - https://raw.githubusercontent.com/AndrejOrsula/drl_grasping/master/.docker/run.bash)" -- ${TAG:-latest} ${CMD} ``` The network communication of `drl_grasping` within this Docker container is configured based on the ROS 2 [`ROS_DOMAIN_ID`](https://docs.ros.org/en/galactic/Concepts/About-Domain-ID.html) environment variable, which can be set via `ROS_DOMAIN_ID={0...101} .docker/run.bash ${TAG:-latest} ${CMD}`. By default (`ROS_DOMAIN_ID=0`), external communication is restricted and multicast is disabled. With `ROS_DOMAIN_ID=42`, the communication remains restricted to `localhost` with multicast enabled, enabling monitoring of communication outside the container but within the same system. Using `ROS_DOMAIN_ID=69` will use the default network interface and multicast settings, which can enable monitoring of communication within the same LAN. All other `ROS_DOMAIN_ID`s share the default behaviour and can be employed to enable communication partitioning for running of multiple `drl_grasping` instances.
Option B – Local Installation ### Hardware Requirements - **CUDA GPU –** CUDA-enabled GPU is required for hardware-accelerated processing of octree observations. Everything else should also be functional on the CPU. ### Dependencies > Ubuntu 20.04 (Focal Fossa) is the recommended OS for local installation. Other Linux distributions might work but require most dependencies to be built from the source. These are the primary dependencies required to use this project that must be installed on your system. - [Python 3.8](https://python.org/downloads) - ROS 2 [Galactic](https://docs.ros.org/en/galactic/Installation.html) - Gazebo [Fortress](https://gazebosim.org/docs/fortress) - [Gym-Ignition](https://github.com/robotology/gym-ignition) - Please use [AndrejOrsula/gym-ignition](https://github.com/AndrejOrsula/gym-ignition) fork in order to ensure compatibility (default branch – [`drl_grasping`](https://github.com/AndrejOrsula/gym-ignition/tree/drl_grasping)). - [O-CNN](https://github.com/microsoft/O-CNN) - Please use [AndrejOrsula/O-CNN](https://github.com/AndrejOrsula/O-CNN) fork in order to ensure compatibility (default branch – [`master`](https://github.com/AndrejOrsula/O-CNN/tree/master)). All additional dependencies are either pulled via [vcstool](https://wiki.ros.org/vcstool) ([drl_grasping.repos](./drl_grasping.repos)) or installed via [pip](https://pip.pypa.io/en/stable/installation) ([python_requirements.txt](./python_requirements.txt)) and [rosdep](https://wiki.ros.org/rosdep) during the building process below. ### Building Clone this repository recursively and import VCS dependencies. Then install dependencies and build with [colcon](https://colcon.readthedocs.io). ```bash # Clone this repository into your favourite ROS 2 workspace git clone --recursive https://github.com/AndrejOrsula/drl_grasping.git # Install Python requirements pip3 install -r drl_grasping/python_requirements.txt # Import dependencies vcs import < drl_grasping/drl_grasping.repos # Install dependencies IGNITION_VERSION=fortress rosdep install -y -r -i --rosdistro ${ROS_DISTRO} --from-paths . # Build colcon build --merge-install --symlink-install --cmake-args "-DCMAKE_BUILD_TYPE=Release" ``` ### Sourcing Before utilizing this project via local installation, remember to source the ROS 2 workspace. ```bash source install/local_setup.bash ``` This enables: - Use of `drl_grasping` Python module - Execution of binaries, scripts and examples via `ros2 run drl_grasping ` - Launching of setup scripts via `ros2 launch drl_grasping ` - Discoverability of shared resources
Test Random Agents A good starting point is to simulate some episodes using random agents where actions are sampled from the defined action space. This is also useful when modifying environments because it lets you analyze the consequences of actions and resulting observations without deep learning pipelines running in the background. To get started, run the following example. It should open RViz 2 and Gazebo client instances that provide you with visual feedback. ```bash ros2 run drl_grasping ex_random_agent.bash ``` After running the example script, the underlying `ros2 launch drl_grasping random_agent.launch.py ...` command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments. ```bash ros2 launch drl_grasping random_agent.launch.py seed:=42 robot_model:=lunalab_summit_xl_gen env:=GraspPlanetary-Octree-Gazebo-v0 check_env:=false render:=true enable_rviz:=true log_level:=warn ```
Train New Agents You can also train your agents from scratch. To begin the training, run the following example. By default, headless mode is used during the training to reduce computational load. ```bash ros2 run drl_grasping ex_train.bash ``` After running the example script, the underlying `ros2 launch drl_grasping train.launch.py ...` command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments. ```bash ros2 launch drl_grasping train.launch.py seed:=42 robot_model:=panda env:=Grasp-OctreeWithColor-Gazebo-v0 algo:=tqc log_folder:=/root/drl_grasping_training/train/Grasp-OctreeWithColor-Gazebo-v0/logs tensorboard_log:=/root/drl_grasping_training/train/Grasp-OctreeWithColor-Gazebo-v0/tensorboard_logs save_freq:=10000 save_replay_buffer:=true log_interval:=-1 eval_freq:=10000 eval_episodes:=20 enable_rviz:=false log_level:=fatal ``` #### Remote Visualization To visualize the agent while training, separate RViz 2 and Gazebo client instances can be opened. For the Docker setup, these commands can be executed in a new `drl_grasping` container with the same `ROS_DOMAIN_ID`. ```bash # RViz 2 (Note: Visualization of robot model will not be loaded using this approach) rviz2 -d $(ros2 pkg prefix --share drl_grasping)/rviz/drl_grasping.rviz # Gazebo client ign gazebo -g ``` #### TensorBoard TensorBoard logs will be generated during training in a directory specified by the `tensorboard_log:=${TENSORBOARD_LOG}` argument. You can open them in your web browser using the following command. ```bash tensorboard --logdir ${TENSORBOARD_LOG} ``` #### (Experimental) Train with Dreamer V2 You can also try to train some agents using the model-based Dreamer V2 algorithm. To begin the training, run the following example. By default, headless mode is used during the training to reduce computational load. ```bash ros2 run drl_grasping ex_train_dreamerv2.bash ``` After running the example script, the underlying `ros2 launch drl_grasping train_dreamerv2.launch.py ...` command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments. ```bash ros2 launch drl_grasping train_dreamerv2.launch.py seed:=42 robot_model:=lunalab_summit_xl_gen env:=GraspPlanetary-ColorImage-Gazebo-v0 log_folder:=/root/drl_grasping_training/train/GraspPlanetary-ColorImage-Gazebo-v0/logs eval_freq:=10000 enable_rviz:=false log_level:=fatal ```
Evaluate New Agents Once you train your agents, you can evaluate them. Start by looking at [ex_evaluate.bash](./examples/ex_evaluate.bash), which can be modified to fit your trained agent. It should open RViz 2 and Gazebo client instances that provide you with visual feedback, while the agent's performance will be logged and printed to `STDOUT`. ```bash ros2 run drl_grasping ex_evaluate.bash ``` After running the example script, the underlying `ros2 launch drl_grasping evaluate.launch.py ...` command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments. For example, you can select a specific checkpoint with the `load_checkpoint:=${LOAD_CHECKPOINT}` argument instead of running the final model. ```bash ros2 launch drl_grasping evaluate.launch.py seed:=77 robot_model:=panda env:=Grasp-Octree-Gazebo-v0 algo:=tqc log_folder:=/root/drl_grasping_training/train/Grasp-Octree-Gazebo-v0/logs reward_log:=/root/drl_grasping_training/evaluate/Grasp-Octree-Gazebo-v0 stochastic:=false n_episodes:=200 load_best:=false enable_rviz:=true log_level:=warn ```
Optimize Hyperparameters The default hyperparameters for training agents with TD3, SAC and TQC can be found under the [hyperparams](./hyperparams) directory. [Optuna](https://optuna.org) can be employed to autotune some of these parameters. To get started, run the following example. By default, headless mode is used during hyperparameter optimization to reduce computational load. ```bash ros2 run drl_grasping ex_optimize.bash ``` After running the example script, the underlying `ros2 launch drl_grasping train.launch.py ...` command with all arguments will always be printed for your reference (example shown below). If desired, you can launch this command directly with custom arguments. ```bash ros2 launch drl_grasping optimize.launch.py seed:=69 robot_model:=panda env:=Grasp-Octree-Gazebo-v0 algo:=tqc log_folder:=/root/drl_grasping_training/optimize/Grasp-Octree-Gazebo-v0/logs tensorboard_log:=/root/drl_grasping_training/optimize/Grasp-Octree-Gazebo-v0/tensorboard_logs n_timesteps:=1000000 sampler:=tpe pruner:=median n_trials:=20 n_startup_trials:=5 n_evaluations:=4 eval_episodes:=20 log_interval:=-1 enable_rviz:=true log_level:=fatal ```

Citation

Please use the following citation if you use drl_grasping in your work.

@inproceedings{orsula_learning_2022,
  author    = {Andrej Orsula and Simon B{\o}gh and Miguel Olivares-Mendez and Carol Martinez},
  title     = {{Learning} to {Grasp} on the {Moon} from {3D} {Octree} {Observations} with {Deep} {Reinforcement} {Learning}},
  year      = {2022},
  booktitle = {2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages     = {4112--4119},
  doi       = {10.1109/IROS47612.2022.9981661}
}

Directory Structure

.
├── drl_grasping/        # [dir] Primary Python module of this project
│   ├── drl_octree/      # [dir] Submodule for end-to-end learning from 3D octree observations
│   ├── envs/            # [dir] Submodule for environments
│   │   ├── control/     # [dir] Interfaces for the control of agents
│   │   ├── models/      # [dir] Functional models for simulation environments
│   │   ├── perception/  # [dir] Interfaces for the perception of agents
│   │   ├── randomizers/ # [dir] Domain randomization of the simulated environments
│   │   ├── runtimes/    # [dir] Runtime implementations of the task (sim/real)
│   │   ├── tasks/       # [dir] Implementation of tasks
│   │   ├── utils/       # [dir] Environment-specific utilities used across the submodule
│   │   └── worlds/      # [dir] Minimal templates of worlds for simulation environments
│   └── utils/           # [dir] Submodule for training and evaluation scripts boilerplate (using SB3)
├── examples/            # [dir] Examples for training and evaluating RL agents
├── hyperparams/         # [dir] Default hyperparameters for training RL agents
├── launch/              # [dir] ROS 2 launch scripts that can be used to interact with this repository
├── pretrained_agents/   # [dir] Collection of pre-trained agents
├── rviz/                # [dir] RViz2 config for visualization
├── scripts/             # [dir] Helpful scripts for training, evaluation and other utilities
├── CMakeLists.txt       # Colcon-enabled CMake recipe
└── package.xml          # ROS 2 package metadata