NiMlr / High-Dim-ES-RL

Paper: Challenges in High-dimensional Reinforcement Learning with Evolution Strategies
MIT License
25 stars 4 forks source link
control-theory derivative-free evolution-strategies evolutionary-algorithms evolutionary-computation evolutionary-strategy machine-learning neural-networks neuroevolution optimization reinforcement-learning stochastic-control

Challenges in High-dimensional Reinforcement Learning with Evolution Strategies

In the following we will provide a quick introduction to working with the code featured in our paper_ on the "Challenges in High-dimensional Reinforcement Learning with Evolution Strategies".

Although most of the following examples are based on relatively canonical choices of optimization problem and evolution strategy, the steps to follow can have minor differences based on a users pick. Please feel free to check out the documented source code or contact us via the email adresses provided in the paper.

.. code:: bash

git clone cd High-Dim-ES-RL

.. code:: bash


pip3 install --upgrade matplotlib numpy

required only for the RL experiments

pip3 install --upgrade tensorflow keras gym

| Contents

Running an evolution strategy on a benchmark_

Training an Open-AI Gym controller_.


Running an evolution strategy on a benchmark

1. Within a python file import everything we need.

.. code:: python

from optimizers import from uhoptimizers import from benchmarkfunctions import *

import numpy as np import matplotlib.pyplot as plt import matplotlib as mpl'seaborn')

2. Pick a problem from the following table:

+--------------------------+-----------------------+------------------------------------------------------------+ | Function(object) | Module | Description | +==========================+=======================+============================================================+ | BenignEllipse | | A moderately conditioned function. | +--------------------------+-----------------------+------------------------------------------------------------+ | BenignEllipseNoisyThres | | A moderately conditioned function with additive noise | | | | above a certain (function value) threshold. | +--------------------------+-----------------------+------------------------------------------------------------+ | BenignEllipseAddNoise | | A stripped down version of the LMMAES implementation. | | | | Featuring no CMA or approximation. ES is reasonable | | | | to use in extremely high dimension. | +--------------------------+-----------------------+------------------------------------------------------------+ | BenignEllipseMultNoise | | An ES for problems in dimensions >> 100 under uncertainty. |
+--------------------------+-----------------------+------------------------------------------------------------+ | Ellipse | | A stripped down version of the UHLMMAES implementation. | | | | Featuring no CMA or respective approximation. | | | | ES is reasonable to use in extremely high dimensions. |
+--------------------------+-----------------------+------------------------------------------------------------+ | EllipseAddNoise | | A badly conditioned function with additive noise of a | | | | specified strength applied. | +--------------------------+-----------------------+------------------------------------------------------------+ | EllipseMultNoise | | A badly conditioned function with multiplicative noise of | | | | a specified strength applied. | +--------------------------+-----------------------+------------------------------------------------------------+ | sphere | | The standard spherical quadratic function. | +--------------------------+-----------------------+------------------------------------------------------------+ | SphereAddNoise | | The standard spherical quadratic function with additive | | | | noise of a specified streght applied. | +--------------------------+-----------------------+------------------------------------------------------------+ | SphereMultNoise | | The standard spherical quadratic function with | | | | multiplicative noise of a specified streght applied. | +--------------------------+-----------------------+------------------------------------------------------------+

| and initialize relevant constants (in case the benchmark function requires these).

.. code:: python

problem dimension

n = 40

noise amplitude for stochastic function

noiseamp = 1

get function object

el = EllipseMultNoise(n, noiseamp) |

3. Grab some optimizer to test from this table:

+------------+-----------------+------------------------------------------------------------+ | Optimizer | Module | Description | +============+=================+============================================================+ | LMMAES | | An ES for problems in dimensions >> 100. | +------------+-----------------+------------------------------------------------------------+ | MAES | | An ES for problems in dimensions > 100. | +------------+-----------------+------------------------------------------------------------+ | ES | | A stripped down version of the LMMAES implementation. | | | | Featuring no CMA or approximation. ES is reasonable to use | | | | in extremely high dimension. | +------------+-----------------+------------------------------------------------------------+ | UHLMMAES | | An ES for problems in dimensions >> 100 under uncertainty. |
+------------+-----------------+------------------------------------------------------------+ | UHES | | A stripped down version of the UHLMMAES implementation. | | | | Featuring no CMA or respective approximation. | | | | UHES is reasonable to use in extremely high dimensions | | | | under uncertainty. |


and initialize it along with these needed input parameters (see respective optimizer docstring for a detailed description).

.. code:: python

# logging
performance_log = []

# set initial pop mean
y0    = np.random.randn(n)/n
# initial step size
step_size = 1./6
# initialize optimizer object
esop  = UHLMMAES(y0, step_size, el, function_budget=1e6, threads=8)


4. Now we can start the optimization

.. code:: python

# the actual optimization routine
termination = False
while termination is False:
    # optimization step
    evals, solution, termination = esop.step()

    # save some useful values
    performance_log.append( [evals,np.mean(esop.fd)] )
    # print some useful values 'Appr. fit: %f  Sigma: %f   F-evals: %d\n' %
        (np.mean(esop.fd), esop.sigma, evals) )

and print the result when done.

.. code:: python

plt.plot(np.array(performance_log)[:,0], np.log10(np.array(performance_log)[:,1]), linewidth=1) plt.title('UHLMMAES on ellipse with (multiplicative) noise') plt.xlabel('function evaluations') plt.ylabel('$log($population mean fitness$)$')

When sampling the performance of each of the algorithms on the ellipse with multiplicative noise you could end up with a plot like this.

.. image:: :width: 80%

Training an Open-AI Gym controller

1. Within a python file import everything we need.

.. code:: python

from optimizers import from uhoptimizers import from applications.control.gymcontrollers import Controller, Models

import numpy as np import matplotlib.pyplot as plt import matplotlib as mpl'seaborn')

2. Pick a neural network controller model from the following table:

+--------------------------+-----------------------+------------------------------------------------------------+ | Model | Module | Description | +==========================+=======================+============================================================+ | Models.smallModel | | Primarily used for testing. Neural Net with layers: | | | | {input, 10-elu, output-sigmoid} | +--------------------------+-----------------------+------------------------------------------------------------+ | Models.bipedalModel | | Primarily used in experiments of the bipedal walker. | | | | Neural Net with layers: {input, 30-elu, | | | | 30-elu, 15-elu, 10-elu, output-sigmoid} | +--------------------------+-----------------------+------------------------------------------------------------+ | Models.robopongModel | | Primarily used in experiments of robopong game. | | | | Neural Net with layers: {input, 30-elu, 30-elu, 15-elu, | | | | 10-elu, output-sigmoid} | +--------------------------+-----------------------+------------------------------------------------------------+ | Models.acrobotModel | | Primarily used in experiments of acrobot game. | | | | Neural Net with layers: {input, 30-elu, 30-elu, | | | | 10-elu, output-sigmoid} |

| Alternatively you can use your own model (make sure it is a valid implementation in the following steps and by checking out the module).

3. Initialize the controller. The action space size can not always be determined correctly. Be sure to supply it in these cases.

.. code:: python

gym environment name

env = "Acrobot-v1" episode_length = 1500

controller = Controller(Models.smallModel, env, episode_length, device='/cpu:0', render=False, force_action_space=3)

In order to run controllers on new environments it is mandatory to implement a ActionTransformations method that transforms the action from the neural net output to the respective gym interface. In some cases this method might just return its input. Additionally, a list of thresholds (can be empty, if no interference is needed) can be supplied in the EarlyStop class that feature premature termination of the episode to save runtime. Regarding the implemented environments this must not be kept in mind. For further inquiry: Check out

4. Run the your favorite Evolution Strategy as introduced in the preceding section.

.. code:: python


performance_log = []

set initial pop mean

y0 = np.abs(np.random.randn(controller.n))/controller.n

initial step size

step_size = 0.3

initialize optimizer object

esop = UHLMMAES(y0, step_size,, function_budget=1e4, threads=1)

the actual optimization routine

termination = False while termination is False:

optimization step

   evals, solution, termination = esop.step()
   # save some useful values
   performance_log.append( [evals,np.mean(esop.fd)])
   # print some useful values 'Appr. fit: %f  Sigma: %f   F-evals: %d\n' %
            (np.mean(esop.fd), esop.sigma, evals) )

Note, that threading is likely not going to work in the current implementation of the gym-controllers (thus set it to 1).

5. Plot and render the result.

.. code:: python

controller.render = True

plt.plot(np.array(performance_log)[:,0], np.array(performance_log)[:,1], linewidth=1) plt.title('UHLMMAES on Acrobot') plt.xlabel('function evaluations') plt.ylabel('population mean fitness')

.. image:: :width: 80%

.. _paper: