nao_rl - Reinforcement Learning Package for the Nao Robot.

This python package integrates V-REP robot simulation software, base libraries for NAO robot control along with reinforcement learning algorithms for solving custom or any OpenAI-gym-based learning environments.

Features:

Parallelized Proximal Policy Optimization (PPO) and Asynchronous Advantage Actor-Critic (A3C) for training agents.
Custom OpenAI-gym-based API for controlling V-REP that makes it easy to create new learning tasks and environments (50 - 100 LOC)
Learned policies can be transferred back to the real robot, or learning can be done online (not recommended).

Custom learning environments for the NAO robot:

1. Balancing / Learning a bipedal gait	2. Object tracking

The goal is to keep an upright position without falling or learn how to move forward	The goal is to keep the object within the visual field by moving the head motors.

Requirements

Base (for learning in simulated environments with virtual NAO):

VREP v3.4.0 - Robot simulation software by Copellia Robotics
Python 2.7 and python-virtualenv.
tensorflow, gym, numpy, opencv-python

Optional (if you have access to the NAO robot):

Choregraphe Suite v2.1.2 - for creating a virtual NAO (requires registering) (By default installed in /opt/Aldebaran/Choregraphe Suite 2.1)
Python NAOqi SDK v2.1.2 - Libraries provided by Softbank robotics for NAO control (requires registering)

Installation

(Tested on Ubuntu 18.04)

1. Clone the repository

git clone https://github.com/andriusbern/nao_rl
cd nao-rl

2. Create and activate the virtual environment

virtualenv env
source env/bin/activate

3. Install the package and the required libraries

python setup.py install

You will be prompted to enter the path to your V-Rep installation directory

Testing the environments

To try the environments out (V-Rep will be launched with the appropriate scene and agent loaded, actions will be sampled randomly):

import nao_rl
env = nao_rl.make('env_name')
env.run()

Where 'env_name' corresponds to one of the following available environments:

NaoTracking - tracking an object using the camera information
NaoBalancing - keeping upright balance
NaoWalking - learning a bipedal gait

Training

To train the agents in these environments you can use built-in RL algorithms:

python train.py NaoTracking a3c 0


Live plotting of training results. Sped up by 40x (enabled with flag '-p').

Positional arguments:

Environment name: name one of nao_rl or gym environments
Training algorithm:
1. --a3c - Asynchronous Advantage Actor-Critic or
2. --ppo - Proximal Policy Optimization
Rendering mode:
1. [0] - Do not render
2. [1] - Render the first worker
3. [2] - Render all workers

To find out more about additional command line arguments:

python train.py -h

The training session can be interrupted at any time and the model is going to be saved and can be loaded later.

Testing trained models

To test trained models:

python test.py trained_models/filename.cpkt

Add -r flag to run the trained policy on the real NAO (can be dangerous). It is recommended to set low fps for the environment e.g. (the robot will perform actions slowly):

python test.py trained_models/filename.cpkt -r -fps 2

andriusbern / NaoRL

readme