Optimising discrete variables

enajx commented 2 years ago

Is there an natural way of optimising parameters over a discrete space? eg. integer space = {-1,0,1}

The integer_variables option 'integer_variables' : [-1,0,1] seems to be ignored by the sampler and still suggests non-integer variables.

There seem to be some research on using CMAES for discrete optimisation but no implementation seems to be available: A discrete version of CMA-ES - arXiv

nikohansen commented 2 years ago

The option 'integer_variables' asks for an index list of variables which shall be "treated as" integer, it does not indicate the allowed values of the variables (which is by default all integers/values).

The current documentation is indeed poor: in the computation of the objective function (or in a wrapper around the objective function), the user still also has to make sure to "interpret" the variables as integers, like by taking the np.floor(x) ~int(x[i])~ before x enters the computation.

Then, using the bounds option will limit the value range (probably you want a range from -1 to 2-1e-9 when the floor is taken, to get the values -1, 0, 1 with similar "weight").

nikohansen commented 2 years ago

An example:

import cma
print(cma.CMAOptions('integer')
{'integer_variables': '[]  # index list, invokes basic integer handling: prevent std dev to become too small in the given variables'}

As mentioned above, the solutions themselves are still vectors of floats, but the rounded or truncated values of their integer variables can be used as the integer value.

import numpy as np
import cma

class F_int:
    def __init__(self, f):
        self.f = f
    def __call__(self, x):
        return self.f(np.floor(x))

x0 = 4 * [14]  # dimension 4
x, es = cma.fmin2(F_int(cma.ff.ellirot), x0, 1,
                  {'integer_variables': list(range(len(x0))),
                   'bounds': [-1, 15 - 1e-9],
                   'tolflatfitness':9})

(4_w,8)-aCMA-ES (mu_w=2.6,w_1=52%) in dimension 4 (seed=320893, Thu Oct 14 20:50:40 2021)
Iterat #Fevals   function value  axis ratio  sigma  min&max std  t[m:s]
    1      8 4.489159420396854e+08 1.0e+00 9.78e-01  9e-01  1e+00 0:00.0
    2     16 4.619426086011490e+08 1.3e+00 9.06e-01  8e-01  1e+00 0:00.0
    3     24 4.193969928086691e+08 1.4e+00 1.02e+00  9e-01  1e+00 0:00.0
   69    552 0.000000000000000e+00 8.2e+00 8.60e-01  1e-01  3e-01 0:00.1
termination on tolfunhist=1e-12 (Thu Oct 14 20:50:40 2021)
final/bestever f-value = 0.000000e+00 0.000000e+00
incumbent solution: [0.8099244721202998, 0.1288951875210707, 0.29402282715843187, 0.5151747591206997]
std deviation: [0.11343181819789307, 0.16606817977003005, 0.1111111111111111, 0.309852677460961]

finds ~10% of the time the global optimum (fellirot is randomized on import and the integer version is usually multimodal). The bounds are in this case not necessary. The 'tolflatfitness' is the number of iterations the same fitness value is tolerated before to stop. The default value may lead to early stopping when all variables are interpreted as integer.

enajx commented 2 years ago

It makes sense now, thank you!

nikohansen commented 2 weeks ago

FTR, a much better integer handling is now default since v4.0.0.

CMA-ES / pycma

Optimising discrete variables #177