Farama-Foundation / Gymnasium-Robotics

A collection of robotics simulation environments for reinforcement learning
https://robotics.farama.org/
MIT License
573 stars 91 forks source link
d4rl gymnasium mujoco reinforcement-learning robotics simulation

Python PyPI pre-commit Code style: black

This library contains a collection of Reinforcement Learning robotic environments that use the Gymnasium API. The environments run with the MuJoCo physics engine and the maintained mujoco python bindings.

The documentation website is at robotics.farama.org, and we have a public discord server (which we also use to coordinate development work) that you can join here: https://discord.gg/YymmHrvS

Installation

To install the Gymnasium-Robotics environments use pip install gymnasium-robotics

These environments also require the MuJoCo engine from Deepmind to be installed. Instructions to install the physics engine can be found at the MuJoCo website and the MuJoCo Github repository.

Note that the latest environment versions use the latest mujoco python bindings maintained by the MuJoCo team. If you wish to use the old versions of the environments that depend on mujoco-py, please install this library with pip install gymnasium-robotics[mujoco-py]

We support and test for Linux and macOS. We will accept PRs related to Windows, but do not officially support it.

Environments

Gymnasium-Robotics includes the following groups of environments:

The D4RL environments are now available. These environments have been refactored and may not have the same action/observation spaces as the original, please read their documentation:

WIP: generate new D4RL environment datasets with Minari.

Multi-goal API

The robotic environments use an extension of the core Gymnasium API by inheriting from GoalEnv class. The new API forces the environments to have a dictionary observation space that contains 3 keys:

This API also exposes the function of the reward, as well as the terminated and truncated signals to re-compute their values with different goals. This functionality is useful for algorithms that use Hindsight Experience Replay (HER).

The following example demonstrates how the exposed reward, terminated, and truncated functions can be used to re-compute the values with substituted goals. The info dictionary can be used to store additional information that may be necessary to re-compute the reward, but that is independent of the goal, e.g. state derived from the simulation.

import gymnasium as gym

env = gym.make("FetchReach-v3")
env.reset()
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())

# The following always has to hold:
assert reward == env.compute_reward(obs["achieved_goal"], obs["desired_goal"], info)
assert truncated == env.compute_truncated(obs["achieved_goal"], obs["desired_goal"], info)
assert terminated == env.compute_terminated(obs["achieved_goal"], obs["desired_goal"], info)

# However goals can also be substituted:
substitute_goal = obs["achieved_goal"].copy()
substitute_reward = env.compute_reward(obs["achieved_goal"], substitute_goal, info)
substitute_terminated = env.compute_terminated(obs["achieved_goal"], substitute_goal, info)
substitute_truncated = env.compute_truncated(obs["achieved_goal"], substitute_goal, info)

The GoalEnv class can also be used for custom environments.

Project Maintainers

Main Contributors: Rodrigo Perez-Vicente, Kallinteris Andreas, Jet Tai

Maintenance for this project is also contributed by the broader Farama team: farama.org/team.

Citation

If you use this in your research, please cite:

@software{gymnasium_robotics2023github,
  author = {Rodrigo de Lazcano and Kallinteris Andreas and Jun Jet Tai and Seungjae Ryan Lee and Jordan Terry},
  title = {Gymnasium Robotics},
  url = {http://github.com/Farama-Foundation/Gymnasium-Robotics},
  version = {1.3.1},
  year = {2024},
}