DLR-RM / rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
https://rl-baselines3-zoo.readthedocs.io
MIT License
2.04k stars 513 forks source link

[feature request] Adding Panda environments #123

Closed qgallouedec closed 3 years ago

qgallouedec commented 3 years ago

🚀 Feature

Add Panda environments to rl-baselines3-zoo.

Motivation

More and more people are testing their reinforcement learning algorithms on Panda environments. To encourage collaboration, and facilitate reproducibility, I propose to integrate these environments to rl-baselines3-zoo.

Pitch

I have developed panda-gym, which is a set of 5 OpenAI/Gym environments for Franka Emika's panda robot, widely used in robotics. These environments are integrated with PyBullet. I propose to integrate these 5 environments to rl-baselines3-zoo.

Alternatives

None.

Additional context

This integration is already done and tested in my fork. I also have the benchmark results the trained agents, and the README section. I open this issue to create a pull request.

araffin commented 3 years ago

Hello,

I have developed panda-gym, which is a set of 5 OpenAI/Gym environments for Franka Emika's panda robot, widely used in robotics.

I'm glad that something like that finally exist =)! (there were equivalents for locomotion tasks but robotics ones were missing).

Before accepting the PR (you will also need to do a PR to add the trained agents in https://github.com/DLR-RM/rl-trained-agents), I would like to know the similarities and differences between Mujoco tasks and those ones (in term of setup/reward/action/observation/...).

qgallouedec commented 3 years ago

I hope you will find in my response all the information you are looking for. A technical report will soon be published on Arxiv, I will put the link in the repo.

Physics

Panda Environments Fetch Environments
Engine PyBullet MuJoCo
Length of an episode 50 (100 for stack) 50

Setup

Reach

Panda Environments Fetch Environments
Goal horizontal range (x,y) 0.3 0.3
Goal vertical range (z) 0.3 0.3

Push

Panda Environments Fetch Environments
Object horizontal range (x,y) 0.3 0.3
Goal horizontal range (x,y) 0.3 0.3

Slide

Panda Environments Fetch Environments
Object horizontal range (x,y) 0.3 0.2
Goal horizontal offset 0.4 0.4
Goal horizontal range (x,y) 0.3 0.6

Pick and place

Panda Environments Fetch Environments
Object horizontal range (x,y) 0.3 0.3
Goal horizontal range (x,y) 0.3 0.3
Prob. goal in the air 0.3 0.5
Goal vertical range (z) 0.2 0.45

Stack

Panda Environments Fetch Environments
Object horizontal range (x,y) 0.3 N/A
Prob. object in the air 0.3 N/A
Object vertical range (z) 0.2 N/A

Reward

Panda Environments Fetch Environments
distance threshold 0.05 0.05
default reward type sparse sparse
other reward type dense dense

Example for the dense reward: PandaSlideDense-v1.

For the stacking spot, the distance threshold is in fact 0.10, corresponding to $\sqrt{{d_1}^2 + {d_2}^2}$, where $d_i$ is the distance between the object $i$ and its target position.

Obervation

Observation type

Panda Environments Fetch Environments
gym.Box gym.Box

Observation size

The observation space is quite different between the Fetch and the Panda environments:

For the Fetch envrionments

In total, there are 25 coordinates for all environments.

For the Panda environments

Thus:

Observation size
Reach 6
Push 18
Slide 18
Pick and Place 19
Stack 31

Action

Action type

Panda Environments Fetch Environments
gym.Box gym.Box

Fingers blocked

Panda Environments Fetch Environments
Reach :heavy_check_mark: :heavy_check_mark:
Push :heavy_check_mark: :heavy_check_mark:
Slide :heavy_check_mark: :heavy_check_mark:
Pick and Place :x: :x:
Stack :x: N/A

Action size

For the sake of consistency, the Fetch environments have decided to put a 4-coordinate action space for all tasks, even the one where the fingers are blocked. This is a choice I did not want to make.

Panda Environments Fetch Environments
Reach 3 4
Push 3 4
Slide 3 4
Pick and Place 4 4
Stack 4 N/A
araffin commented 3 years ago

I hope you will find in my response all the information you are looking for. A technical report will soon be published on Arxiv, I will put the link in the repo.

thanks for the detailed reply, one additional important info would be a visualization of the two workspaces reachable by Panda vs Fetch robot, no?

And you command relative or absolute x,y,z pos for the action?

qgallouedec commented 3 years ago

thanks for the detailed reply, one additional important info would be a visualization of the two workspaces reachable by Panda vs Fetch robot, no?

I don't know how to easily represent the space reachable by these robots. I will instead compare (as soon as I renew my MuJoCo license) the maximum range in x, y and z. Of course, this won't be enough to describe the workspaces completely, but it will be informative enough in my opinion.

And you command relative or absolute x,y,z pos for the action?

It is relative movement for both:

Panda Environments Fetch Environments
0:2 displacement in x, y, z displacement in x, y, z
3 (when applied) fingers opening fingers opening

For example : action = [0.0, 0.0, 0.05, 0.03] means: move 5 cm up, and set the fingers opening to 3 cm.