AndrejOrsula / drl_grasping

Deep Reinforcement Learning for Robotic Grasping from Octrees
https://arxiv.org/pdf/2208.00818
BSD 3-Clause "New" or "Revised" License
406 stars 54 forks source link

EPIC: Design and decisions #50

Closed AndrejOrsula closed 3 years ago

AndrejOrsula commented 3 years ago

This is the current design that contains major decisions for the project. Additional future work and improvements that are not part of this design are listed in https://github.com/AndrejOrsula/drl_grasping/issues/51 and might be eventually included in the implementation if time allows it.


Setup

Reproducible setup in a simulation and test setup in real life. Ignition Gazebo was selected as the robotics simulator and is used for training the RL agent.


Task

Grasping in its simplest can be conceptually decoupled into these sub-routines. RL agent should aspire to learn steps 1-3, with the 4th step being determined by surrounding application, e.g. success or max number of steps. Agent can learn additional steps through exploration, e.g. push and pull objects in order to provide better grasping conditions.

  1. Move end effector (gripper) to pre-grasp pose
    • Pose must be determined from sensory observations
  2. Close gripper
  3. Lift object above the supporting surface
    • Make sure the grasp is secure
  4. Terminate and allow other tasks/processes to execute
    • Outside of scope for agent's policy, but it needs to be determined both in simulation and real-life

These stages slightly inspired stages in curriculum learning, see https://github.com/AndrejOrsula/drl_grasping/issues/62.


Control loop

  1. Get observations
  2. Predict actions
  3. Execute actions (simultaneously)
    • Move arm to the new configuration
    • Execute gripper action
      • This action might be executed much faster than arm movement
  4. Repeat until termination (success or max steps)

Another approach would be to decompose the task to sensing, planning, execution (e.g. robot action would consist only of grasp pose and everything else would be performed outside the agent's policy), or remove gripper action from the control loop and perform grasp once episode is terminated or a certain Z position is reached, e.g. https://arxiv.org/pdf/1802.10264.pdf. However, the selected 'dynamic' closed-loop control was selected as it resembles what humans do more closely.


RL Algorithm


Actions

List of actions that the agent is allowed to take that must provide the ability to accomplish task successfully. All of these will be part of a single action-space vector.

End effector pose

Position (Relative)
Orientation (Relative)

Gripper

Gripper (Absolute)
Gripper (Relative)

Observations

Octree of the scene

End effector pose

Position
Orientation

Gripper state


Reward function

Ongoing epic: https://github.com/AndrejOrsula/drl_grasping/issues/41

Sparse (shaped)

reward multiplier r (currently r = 4.0)


Policy (network architecture)

Currently, using depth=4 and full_depth=2

Feature Extractor (shared between actor and critics):


Domain randomisation

Currently, the following domain randomisation can be applied in the simulation

AndrejOrsula commented 3 years ago

Design is finalized and this document was updated