[feature request] Adding Panda environments

qgallouedec commented 3 years ago

🚀 Feature

Add Panda environments to rl-baselines3-zoo.

Motivation

More and more people are testing their reinforcement learning algorithms on Panda environments. To encourage collaboration, and facilitate reproducibility, I propose to integrate these environments to rl-baselines3-zoo.

Pitch

I have developed panda-gym, which is a set of 5 OpenAI/Gym environments for Franka Emika's panda robot, widely used in robotics. These environments are integrated with PyBullet. I propose to integrate these 5 environments to rl-baselines3-zoo.

Alternatives

None.

Additional context

This integration is already done and tested in my fork. I also have the benchmark results the trained agents, and the README section. I open this issue to create a pull request.

araffin commented 3 years ago

Hello,

I have developed panda-gym, which is a set of 5 OpenAI/Gym environments for Franka Emika's panda robot, widely used in robotics.

I'm glad that something like that finally exist =)! (there were equivalents for locomotion tasks but robotics ones were missing).

Before accepting the PR (you will also need to do a PR to add the trained agents in https://github.com/DLR-RM/rl-trained-agents), I would like to know the similarities and differences between Mujoco tasks and those ones (in term of setup/reward/action/observation/...).

qgallouedec commented 3 years ago

I hope you will find in my response all the information you are looking for. A technical report will soon be published on Arxiv, I will put the link in the repo.

Physics

	Panda Environments	Fetch Environments
Engine	PyBullet	MuJoCo
Length of an episode	50 (100 for stack)	50

Setup

Reach

	Panda Environments	Fetch Environments
Goal horizontal range (x,y)	0.3	0.3
Goal vertical range (z)	0.3	0.3

Push

	Panda Environments	Fetch Environments
Object horizontal range (x,y)	0.3	0.3
Goal horizontal range (x,y)	0.3	0.3

Slide

	Panda Environments	Fetch Environments
Object horizontal range (x,y)	0.3	0.2
Goal horizontal offset	0.4	0.4
Goal horizontal range (x,y)	0.3	0.6

Pick and place

	Panda Environments	Fetch Environments
Object horizontal range (x,y)	0.3	0.3
Goal horizontal range (x,y)	0.3	0.3
Prob. goal in the air	0.3	0.5
Goal vertical range (z)	0.2	0.45

Stack

	Panda Environments	Fetch Environments
Object horizontal range (x,y)	0.3	N/A
Prob. object in the air	0.3	N/A
Object vertical range (z)	0.2	N/A

Reward

	Panda Environments	Fetch Environments
distance threshold	0.05	0.05
default reward type	sparse	sparse
other reward type	dense	dense

Example for the dense reward: PandaSlideDense-v1.

For the stacking spot, the distance threshold is in fact 0.10, corresponding to $\sqrt{{d_1}^2 + {d_2}^2}$, where $d_i$ is the distance between the object $i$ and its target position.

Obervation

Observation type

Panda Environments	Fetch Environments
`gym.Box`	`gym.Box`

Observation size

The observation space is quite different between the Fetch and the Panda environments:

For the Fetch envrionments

gripper position (3 coord.): (x, y, z) of the gripper.
object position (3 coord.): (x, y, z) of the object. (0, 0, 0) if no object.
object relative position (3 coord.) : gripper position - object position
gripper state (2 coord.): (w/2, w/2) the two coordinates are egal, and the sum of them is the distance between the fingers.
object rotation (3 coord.): (rx, ry, rz) ; (0, 0, 0) if no object.
object velocity (3 coord.): (vx, vy, vz) ; (0, 0, 0) if no object.
object rotationnal velocity (3 coord.): (rvx, rvy, rvz) ; (0, 0, 0) if no object.
gripper velocity (3 coord.): (vx, vy, vz)
fingers velocity, (2 coord.): (dw/2, dw/2)

In total, there are 25 coordinates for all environments.

For the Panda environments

the position and the velocity of the gripper (6 coordinates)
the distance between the fingers (1 cordinate), if not blocked.
the position, velocity, linear and rotational of the object(s) (12 coordinates/object).

Thus:

	Observation size
Reach	6
Push	18
Slide	18
Pick and Place	19
Stack	31

Action

Action type

Panda Environments	Fetch Environments
`gym.Box`	`gym.Box`

Fingers blocked

	Panda Environments	Fetch Environments
Reach	:heavy_check_mark:	:heavy_check_mark:
Push	:heavy_check_mark:	:heavy_check_mark:
Slide	:heavy_check_mark:	:heavy_check_mark:
Pick and Place	:x:	:x:
Stack	:x:	N/A

Action size

For the sake of consistency, the Fetch environments have decided to put a 4-coordinate action space for all tasks, even the one where the fingers are blocked. This is a choice I did not want to make.

	Panda Environments	Fetch Environments
Reach	3	4
Push	3	4
Slide	3	4
Pick and Place	4	4
Stack	4	N/A

araffin commented 3 years ago

I hope you will find in my response all the information you are looking for. A technical report will soon be published on Arxiv, I will put the link in the repo.

thanks for the detailed reply, one additional important info would be a visualization of the two workspaces reachable by Panda vs Fetch robot, no?

And you command relative or absolute x,y,z pos for the action?

qgallouedec commented 3 years ago

thanks for the detailed reply, one additional important info would be a visualization of the two workspaces reachable by Panda vs Fetch robot, no?

I don't know how to easily represent the space reachable by these robots. I will instead compare (as soon as I renew my MuJoCo license) the maximum range in x, y and z. Of course, this won't be enough to describe the workspaces completely, but it will be informative enough in my opinion.

And you command relative or absolute x,y,z pos for the action?

It is relative movement for both:

	Panda Environments	Fetch Environments
0:2	displacement in x, y, z	displacement in x, y, z
3	(when applied) fingers opening	fingers opening

For example : action = [0.0, 0.0, 0.05, 0.03] means: move 5 cm up, and set the fingers opening to 3 cm.

DLR-RM / rl-baselines3-zoo