deeplearninc / relaax

Reinforcement Learning framework to facilitate development and use of scalable RL algorithms and applications
Other
62 stars 10 forks source link

CircleCI

REinforcement Learning Algorithms, Autoscaling and eXchange (RELAAX)

RELAAX is a framework designed to:

  1. Simplify research and development of Reinforcement Learning applications by taking care of underlying infrastructure

  2. Provide a usable and scalable implementation of state of the art Reinforcement Learning Algorithms

  3. Ease deployment of Agents and Environments for training and exploitation of the trained Agents at scale on popular cloud platforms

Please find some samples demostrating RELAAX abilities in relaax_sample_apps repository.

The components of RELAAX include:

Contents

Quick start

We recommended you use an isolated Python environment to run RELAAX. Virtualenv or Anaconda are examples. If you're using the system's python environment, you may need to run pip install commands with sudo and you also have to be sure that you have python-pip installed.

You could see what other option are available by running:

relaax new --help

Running on Windows

To run RELAAX on Windows we recommend using WinPython distribution. We have tested WinPython 3.6.1.0Qt5 64bit version on Windows 8.1 and Windows 10. You will also need Microsoft Visual C++ Build Tools 14.00 if you want to use OpenAI Gym environments. You can download it here or here.

Once you have installed these distributions you need to run "WinPython Command Prompt.exe", which is located in you WinPython installation directory. This will open a command prompt windows configured to use WinPython as Python distribution. After that you can install relaax using pip:

C:\WinPython-64bit-3.6.1.0Qt5>cd ..
C:\>git clone git@github.com:deeplearninc/relaax.git
C:\>cd relaax
C:\relaax>pip install -e .

This command will install relaax and it's dependencies under WinPython's Python environment. You will need run relaax from WinPython Command Prompt or create a custom shortcut for it.

To add OpenAI Gym support run the following comands (again, from WinPythonCommand Prompt):

C:\WinPython-64bit-3.6.1.0Qt5>pip install gym
C:\WinPython-64bit-3.6.1.0Qt5>pip install git+https://github.com/Kojoley/atari-py.git

Now you can create a test environment to test if everything is working:

C:\WinPython-64bit-3.6.1.0Qt5>cd ..
C:\relaax new bandit-test
C:\>cd bandt-test
C:\bandit-test>relaax run all

This will run a test environment with multi-armed bandit. You should see output in a separate console window.

To test OpenAI Gym environment:

C:\WinPython-64bit-3.6.1.0Qt5>cd ..
C:\relaax new gym-test -e openai-gym
C:\>cd gym-test
C:\gym-test>relaax run all

This will run a CartPole-v0 environment.

System Architecture

img

Release Notes

1.1.0

RELAAX Agent Proxy

Agent Proxy run with your Environments training and is used to communicate with RL Agents. At the moment client implemented in Python, later on we are planning to implement client code in C/C++, Ruby, GO, etc. to simplify integration of other environments.

Python API:

Available from relaax.environment.agent_proxy import AgentProxy, AgentProxyException

Agent Proxy is simple but requires certain amout of code to iitialize Agent, connection and reconnect to the Agents, handle exceptions, etc. To simplify all that even further, you may use TrainingBase class, which wrapps all details of the Agent Proxy operations.

Python API:

Avalable from relaax.environment.training import TrainingBase

Reinforcement Learning eXchange protocol

Reinforcement Learning eXchange protocol is a simple binary protocol implemented over TCP. It allows to send State of the Environment and Reward to the Server and deliver Action from the Agent to the Environment.

img

Reinforcement Learning eXchange protocol definition

Message exchange:

{'command': 'init', 'exploit': False|True} -> {'response': 'ready'} OR {'response': 'error', 'message': 'can\'t initialize agent'}

{'command': 'update', 'terminal': False|True, 'state': [], 'reward': 1} -> {'response': 'action', 'data': 1} OR {'response': 'error', 'message': 'can\'t update state'}

{'command': 'reset'} -> {'response': 'done'} OR {'response': 'error', 'message': 'can\'t reset agent'}

{'command': 'update_metrics', 'name': name, 'y': y, 'x': x} -> {'response': 'done'} OR {'response': 'error', 'message': 'can\'t update metrics'}

'reward' can be TYPE_NULL, TYPE_DOUBLE or TYPE_LIST(TYPE_DOUBLE)
'state' can be TYPE_IMAGE, TYPE_LIST or TYPE_NDARRAY
'data' can be TYPE_INT4, TYPE_DOUBLE or TYPE_LIST

Basic types:

Name Value Size in bytes
TYPE_NONE 0 0
TYPE_NULL 1 0
TYPE_INT4 2 4
TYPE_STRING_UTF8 3 variable
TYPE_DOUBLE 4 8
TYPE_BOOLEAN 5 1
TYPE_IMAGE 6 variable
TYPE_NDARRAY 7 variable
TYPE_LIST 8 variable
TYPE_UINT4 9 4
TYPE_INT64 10 8
TYPE_DICT 11 variable

Message format:

| "[size in bytes]:"|Version(TYPE_UINT4)| key value0 | ... | key valueN | ","

Key/Typed Value format:

| Key name(TYPE_STRING_UTF8)| Type(1 byte) | Value |

TYPE_BOOLEAN value:

| 0(False)/1(True) |

TYPE_STRING_UTF8 value:

| Length in bytes(TYPE_UINT4) | bytes UTF8 |

TYPE_IMAGE value:

| image type(TYPE_STRING_UTF8) | xdim(TYPE_UINT4) | ydim(TYPE_UINT4) | Length in bytes(TYPE_UINT4) | bytes |

image type values: see PIL image doc

TYPE_NDARRAY value:

| shapes count(TYPE_UINT4) | shape0 |...| shapeN | Length in bytes(TYPE_UINT4) | bytes |

TYPE_LIST value:

| number of items(TYPE_UINT4) | item0 |...| itemN |

TYPE_DICT value:

| number of items(TYPE_UINT4) | key value0 | ... | key valueN |

Examples

Update metrics message:

{
    'command': 'update_metrics',
    'data': [{ 'method': method, 'name': name, 'y': y, 'x': x }, { 'method': method1, 'name': name1, 'y': y1, 'x': x1 }]
}

RELAAX Servers

RELAAX Server implements dynamic loading of the algorithm implementation. By convention, every algorithm implementation exposes Agent (agent.py) and ParameterServer (parameter_server.py) classes. RLX Server loads and instattiate Agent model for every incoming connection from Environment. There usually single Parameter Server (PS). PS loads and instantiate ParameterServer class. Every Agent gets RPC connection to PS and could call remotely methods exposed on PS model.

Model(s) on PS should be registered with Session and methods exposed to RPC using Op method. See samples/simple-exchange-js/algorithm/ to very basic sample of the Agent, ParameterServer, and Model implementation and data exchange between them.

RLX Server

Agent API:

Parameter Server

ParameterServer API: