NiMlr / High-Dim-ES-RL

Paper: Challenges in High-dimensional Reinforcement Learning with Evolution Strategies
MIT License
25 stars 4 forks source link

Weird behaviour using low dimensions #1

Closed mfagerlund closed 6 years ago

mfagerlund commented 6 years ago

Thanks for this sourcecode, I was able to reproduce the code in c# - which was really hard to do from the paper.

Anyway, I was fighting really hard to get things ( LMMAES ) to converge using rosenbrock and 12 dimensions in my implementation - and I decided to try in yours. Turns out it fails in exactly the same way ;)

Doing rosenbrock with anything less than 18 fails, but at 18 it converges promptly.

def rosenbrock(x): return sum(100.0*(x[1:]-x[:-1]2.0)2.0 + (1-x[:-1])**2

MAES converges all the way down to 12 dimensions.

Maybe the settings could be tweaked somehow?

Anyway, thanks for the source!

mfagerlund commented 6 years ago

Here's my full test source: ` from optimizers import from uhoptimizers import from benchmarkfunctions import *

import numpy as np import matplotlib.pyplot as plt import matplotlib as mpl

def rosenbrock(x): return sum(100.0*(x[1:]-x[:-1]2.0)2.0 + (1-x[:-1])**2.0)

mpl.style.use('seaborn') result_log = [] for i in range(1):

problem dimension

n = 12
# noise amplitude for stochastic function
# noiseamp = 1
# get function object
#el = sphere
el=rosenbrock

# logging
performance_log = []

# set initial pop mean
y0    = np.random.randn(n)/n
#y0    = np.ones(n)/n

# initial step size
step_size = 1/6.
# initialize optimizer object
esop  = MAES(y0, step_size, el, function_budget=1000000, function_target=0.0000001, threads=8)

#1999, 277, Appr. fit: 0.000000  Sigma: 0.000000   F-evals: (2985.042000) 3336
# the actual optimization routine
termination = False
while termination is False:
    # optimization step
    evals, solution, termination = esop.step()

    # save some useful values
    performance_log.append( [evals,np.mean(esop.fd)] )
    # print some useful values
    if termination or esop.t%50==0 or esop.t==0:
        result_log.append(evals);
        esop.report( '%d, %d, Appr. fit: %f  Sigma: %f   F-evals: (%f) %d\n' %
            (i, esop.t, np.mean(esop.fd), esop.sigma, np.mean(result_log), evals) )

print(np.mean(result_log))
`

NiMlr commented 6 years ago

Hi, thanks for your interest in the code. Please excuse my late reply.

The LM-MA-ES and the MA-ES are both approximating the candidate distribution of the CMA-ES [1], mainly for the reason of scalability to problems with a large number of parameters. The faces of the scalability issue are covered in these papers [2, 3, 4] as well as the one this repository was made for. Typically, the design of these algorithms involves some tuning of constants relevant to the algorithm - meaning the algorithm can fail in domains it was not tuned for. The reason the algorithms were not tuned in all domains is that the benefits of approximating structural components diminish in low dimensions. The domains where it is advisable to use the algorithms are indicated in the second table of the readme. Furthermore, the code is likely not as robust as it could be, as it is mainly intended for reproducing the results of the paper. In the future I am planning to implement some more ESs and open a separate repository for this, so you are welcome to stop by.

For now, you have two options:

The first solution is reasonable if you want a perfectly stable code and do not want to more carefully refine the algorithm by testing the robustness and optimality of the constants on a set of benchmark functions (or even mathematically prove this).

I hope this helps.

[1] https://en.wikipedia.org/wiki/CMA-ES [2] MA-ES: http://www.honda-ri.de/pubs/pdf/1177.pdf [3] LM-CMA-ES (similar to LM-MA-ES): https://arxiv.org/pdf/1511.00221 [4] LM-MA-ES: https://arxiv.org/pdf/1705.06693 [5] CMA-ES by Nikolaus Hansen: http://cma.gforge.inria.fr/cmaes_sourcecode_page.html [6] Shark: http://image.diku.dk/shark/ [7] DEAP: https://github.com/DEAP/deap