initial parameters - Githubissues

nikohansen commented 10 years ago

the code does not demand initial values for the start point and for sigma. I consider this as a bug. These values are not generic and cannot be generically generated, they must be demanded from the user. It is however, to some extend, possible to derive them from space boundaries.

beniz commented 10 years ago

Here is how parameter initialization works for now (see main constructor in https://github.com/beniz/libcmaes/blob/master/src/cmaparameters.h ):

the dimension is mandatory;
lambda and sigma are part of the constructor, but optional: when -1 they are automatically set;
the start point x0 is set with one of the set_x0 methods: this is because the user can either set a value, a vector, or bounds (from which the initial starting point is sampled). If unspecified, the start point is sampled from [-4,4].

We could do the other way around, and make sigma and x0 mandatory except if a special flag for automatically deriving the values is set by the user.

Another way yet would be to add a warning to results printing when initial values have been derived automatically.

There may be other ways still, please let me know your thoughts.

As for comparison, and not saying this is applicable here, lbfgs requires an initial x but has all other parameters with default value, see API here: https://github.com/chokkan/liblbfgs/blob/master/include/lbfgs.h and Minuit2 gets initial values internally from ROOT, I haven't yet found how the default values are set, but my understanding is that there may be default depending on the type of 'likelihood' used (Chi2, log-likelihood, ...).

beniz commented 10 years ago

As a first 'fix', I can add set_x0 example to sample-code.cc and similar examples in examples/

nikohansen commented 10 years ago

initial x and sigma should be required. Alternatively, they could be deduced from an initial search interval. The latter should not be confused with domain boundaries, because the result might be quite undesirable if we deduce initial values using the entire feasible domain.

beniz commented 10 years ago

OK, will proceed. Where can I find a general rule of thumb to guide the user through the selection of sigma ?

beniz commented 10 years ago

Commit above makes specifying x0 and sigma mandatory.

For deducing them from an initial search interval, I could close this ticket and open another specific one, and where discussion could follow up.

nikohansen commented 10 years ago

In short: the optimum that we want to find should better not be far away from the interval [x0 - sigma0, x0 + sigma0] in each component, where distance is defined by sigma0. On a similar note, all variables should be rescaled such that they are likely to have similar sensitivity, see https://www.lri.fr/~hansen/cmaes_inmatlab.html#practical. If you point me to the place where this is (going to be) documented in the library, I will check it.

beniz commented 10 years ago

Understood.

linear (and log) automatic scaling can be implemented so that this becomes transparent to the user;
In application within ROOT: Minuit uses a per-variable initial step-size, therefore my understanding is that since we are using the Identity as initial cov matrix, the variables should definitely be rescaled beforehand when using CMA-ES in place of Minuit, so that sigma0 can apply to all components;
In terms of documentation, I can add a paragraph to the README.md and link to the link you provided, and in addition, I can write a wiki page here on github that you will be able to modify / improve as needed.

I can open a ticket for first point, and fix the third one, closing this ticket.

nikohansen commented 10 years ago

I updated a little bit the comments in the above mentioned www link (I know, it was the wrong order).

To choose a different diagonal initial covariance matrix or apply a scaling with identity as initial covariance matrix is equivalent up to numerics, but indeed the latter seems preferable in almost any respect.

The caveat for log-encoding is that for very small values it might often not work, because changes very close to zero don't have an effect anymore.

nikohansen commented 10 years ago

...another point: for a specific application, like the likelihood fitting in ROOT, meaningful default values might be available and used when coupled with libcmaes. This should be carefully checked though.

beniz commented 10 years ago

Yes, I will suggest / run an analysis of likelihood functions in ROOT for CMA-ES, as the black-box function are hidden away behind them. The optimizer Fumili for instance is specifically targeted at Chi2 likelihood and there may be something to take from it.

Will close this ticket in a moment, and open others for further enhancements.

CMA-ES / libcmaes

initial parameters #39