Closed qgallouedec closed 3 years ago
Hello,
I have developed panda-gym, which is a set of 5 OpenAI/Gym environments for Franka Emika's panda robot, widely used in robotics.
I'm glad that something like that finally exist =)! (there were equivalents for locomotion tasks but robotics ones were missing).
Before accepting the PR (you will also need to do a PR to add the trained agents in https://github.com/DLR-RM/rl-trained-agents), I would like to know the similarities and differences between Mujoco tasks and those ones (in term of setup/reward/action/observation/...).
I hope you will find in my response all the information you are looking for. A technical report will soon be published on Arxiv, I will put the link in the repo.
Panda Environments | Fetch Environments | |
---|---|---|
Engine | PyBullet | MuJoCo |
Length of an episode | 50 (100 for stack) | 50 |
Panda Environments | Fetch Environments | |
---|---|---|
Goal horizontal range (x,y) | 0.3 | 0.3 |
Goal vertical range (z) | 0.3 | 0.3 |
Panda Environments | Fetch Environments | |
---|---|---|
Object horizontal range (x,y) | 0.3 | 0.3 |
Goal horizontal range (x,y) | 0.3 | 0.3 |
Panda Environments | Fetch Environments | |
---|---|---|
Object horizontal range (x,y) | 0.3 | 0.2 |
Goal horizontal offset | 0.4 | 0.4 |
Goal horizontal range (x,y) | 0.3 | 0.6 |
Panda Environments | Fetch Environments | |
---|---|---|
Object horizontal range (x,y) | 0.3 | 0.3 |
Goal horizontal range (x,y) | 0.3 | 0.3 |
Prob. goal in the air | 0.3 | 0.5 |
Goal vertical range (z) | 0.2 | 0.45 |
Panda Environments | Fetch Environments | |
---|---|---|
Object horizontal range (x,y) | 0.3 | N/A |
Prob. object in the air | 0.3 | N/A |
Object vertical range (z) | 0.2 | N/A |
Panda Environments | Fetch Environments | |
---|---|---|
distance threshold | 0.05 | 0.05 |
default reward type | sparse | sparse |
other reward type | dense | dense |
Example for the dense reward: PandaSlideDense-v1
.
For the stacking spot, the distance threshold is in fact 0.10, corresponding to $\sqrt{{d_1}^2 + {d_2}^2}$, where $d_i$ is the distance between the object $i$ and its target position.
Panda Environments | Fetch Environments |
---|---|
gym.Box |
gym.Box |
The observation space is quite different between the Fetch and the Panda environments:
In total, there are 25 coordinates for all environments.
Thus:
Observation size | |
---|---|
Reach | 6 |
Push | 18 |
Slide | 18 |
Pick and Place | 19 |
Stack | 31 |
Panda Environments | Fetch Environments |
---|---|
gym.Box |
gym.Box |
Panda Environments | Fetch Environments | |
---|---|---|
Reach | :heavy_check_mark: | :heavy_check_mark: |
Push | :heavy_check_mark: | :heavy_check_mark: |
Slide | :heavy_check_mark: | :heavy_check_mark: |
Pick and Place | :x: | :x: |
Stack | :x: | N/A |
For the sake of consistency, the Fetch environments have decided to put a 4-coordinate action space for all tasks, even the one where the fingers are blocked. This is a choice I did not want to make.
Panda Environments | Fetch Environments | |
---|---|---|
Reach | 3 | 4 |
Push | 3 | 4 |
Slide | 3 | 4 |
Pick and Place | 4 | 4 |
Stack | 4 | N/A |
I hope you will find in my response all the information you are looking for. A technical report will soon be published on Arxiv, I will put the link in the repo.
thanks for the detailed reply, one additional important info would be a visualization of the two workspaces reachable by Panda vs Fetch robot, no?
And you command relative or absolute x,y,z pos for the action?
thanks for the detailed reply, one additional important info would be a visualization of the two workspaces reachable by Panda vs Fetch robot, no?
I don't know how to easily represent the space reachable by these robots. I will instead compare (as soon as I renew my MuJoCo license) the maximum range in x, y and z. Of course, this won't be enough to describe the workspaces completely, but it will be informative enough in my opinion.
And you command relative or absolute x,y,z pos for the action?
It is relative movement for both:
Panda Environments | Fetch Environments | |
---|---|---|
0:2 | displacement in x, y, z | displacement in x, y, z |
3 | (when applied) fingers opening | fingers opening |
For example : action = [0.0, 0.0, 0.05, 0.03]
means: move 5 cm up, and set the fingers opening to 3 cm.
🚀 Feature
Add Panda environments to
rl-baselines3-zoo
.Motivation
More and more people are testing their reinforcement learning algorithms on Panda environments. To encourage collaboration, and facilitate reproducibility, I propose to integrate these environments to
rl-baselines3-zoo
.Pitch
I have developed
panda-gym
, which is a set of 5 OpenAI/Gym environments for Franka Emika's panda robot, widely used in robotics. These environments are integrated with PyBullet. I propose to integrate these 5 environments torl-baselines3-zoo
.Alternatives
None.
Additional context
This integration is already done and tested in my fork. I also have the benchmark results the trained agents, and the README section. I open this issue to create a pull request.