RicardoDominguez / PyCREPS

Contextual Relative Entropy Policy Search for Reinforcement Learning in Python
14 stars 1 forks source link
creps gpreps hireps pytorch reinforcement-learning reps rl theano

Contextual Relative Entropy Policy Search

Python implementation of Contextual Relative Entropy Policy Search for Reinforcement Learning problems.

Relevant papers

Implementation

Contextual Policy Search allows to generalize policies to multiple contexts, where the context contains some information about the Reinforcement Learning problem at hand, such as the objective of the agent or properties of the environment. CREPS follows a hierarchical approach to contextual policy search, in which there are two policies:

The file CREPS.py implements the upper policy as a linear-Gaussian model which is updated using weighted ML.

All other elements of the Reinforcement Learning problem —environment dynamics, reward function and lower policy— must be implemented for your particular scenario as you consider best with only a few considerations in mind to ensure compatibility with the upper policy and policy update function. It is then very straightforward to put everything together, as illustrated in the next section.

Implementations of CREPS.py using PyTorch (CREPS_torch.py) and Theano (CREPS_theano.py) are provided, but please ensure that your application is sufficiently computationally expensive to take advantage of these methods (otherwise the computational overhead introduced will be larger than the run-time performance improvements).

How to set up your own scenario

The steps of the each policy iteration are:

1. Run M episodes, storing for each episode the lower policy parameters sampled,
   the episode context and the episodic reward
2. Compute the sample weights for policy update
3. Update upper policy using the sample weights

Which in my examples I have implemented as:

R, W, F = predictReward(env, M, hpol, pol) # 1
p = computeSampleWeighting(R, F, eps)      # 2
hpol.update(W, F, p)                       # 3

where env is a class with the environment dynamics, hpol the upper policy and pol the lower level policy.

The methods for steps 2 and 3 are implemented in CREPS.py, thus you only need to worry about step 1, which MUST return the following numpy arrays:

Imagine your scenario is an OpenAI Gym environment, an intuitive implementation of predictReward would be:

def predictReward(env, M, hipol, pol):
  for rollout in xrange(M):
      s = env.reset()                    # Sample context
      w = hipol.sample(s.reshape(1, -1)) # Sample lower-policy weights

      W[rollout, :] = w
      F[rollout, :] = s

      done = False
      x = s
      while not done:
          u = pol.sample(w.T, x)          # Sample action from lower policy
          x, r, done, info = env.step(u)
          R[rollout] += r                 # Update episode reward

  return R, W, F

Examples

In all the examples provided there are three files:

For a full example of CREPS being used to solve the Cart Pole OpenAI gym environment check /cartPole. This example includes the optional use of PyTorch and Theano through the global flags use_torch and use_theano. To run it use:

$ python cartPole/cartPole_learn.py

For a full example of CREPS being used to solve the Acrobot OpenAI gym environment check /acrobot. To run it use:

$ python acrobot/acrobot_learn.py

For a full example of how you could use CREPS for your own Reinforcement Learning problem check /customEnv, where a differential drive robot learns to follow a straight wall using a PID controller (here the context is the starting distance from the wall and initial angle with respect to the wall). To run it use

$ python customEnv/robot_learn.py

Furthermore, CREPS can be easily extended to a more data-efficient model-based approach. /cartPole_GPREPS offers a quick example of this approach, using Gaussian Processes to learn the forward dynamics of the environment. To run it use

$ python cartPole_GPREPS/cartPole_learn.py

Dependencies

Optional:

Contributing

All enhancements are welcome. Feel free to give suggestions or raise issues.