automl / CARP-S

A Framework for Comparing N Hyperparameter Optimizers on M Benchmarks.
https://automl.github.io/CARP-S/latest/
Other
8 stars 0 forks source link

Improve performance of SMAC3-2.0 #103

Closed benjamc closed 5 months ago

benjamc commented 5 months ago

TL,DR; ConfigSelector maximize is weird.

Description from Helena:

Situation

In the config selector we sample the next needed configurations(s) with the Acquisition Maximizer. If we retrain the surrogate after _retrain_after iterations, then we need that many configurations:

challengers = self._acquisition_maximizer.maximize(
    previous_configs,
    n_points=self._retrain_after,
    random_design=self._random_design,
)

E.g. for the standard blackbox setting of retrain_after = 8, 8 challengers are returned and then subsequently yielded by the ConfigSelector (if they have not been yielded before).

Issue 1 Ambiguous Docstring

The docstring of maximize says that n_points is „Number of points to be sampled. If n_points is not specified, self._challengers is used.“ It is unclear whether this means the number of configurations that will be returned or the number of points that will be sampled in the process of finding the maximum. This is especially confusing in combination with the docstring for challengers, which does the same (if _points is not given), but for which the docstring reads "Number of configurations to sample from the configuration space to get the acquisition function value for, thus challenging the current incumbent and becoming a candidate for the next function evaluation."

Issue 2 Ambiguous Usage

If n_points is not specified, it is set to self._challengers in the Aquisition Maximizer. By default (if n_points were None), this is set to 5000 for all maximizers. This is quite the discrepancy with the actually used value of 8 and reveals that initially this probably should maybe have been the number of points that will be sampled in the process of finding the maximum.

Next Step: Calling Maximize

In the maximize function of the Acquisition Maximizer, a function is defined that returns the next config „by acquisition value“.

def next_configs_by_acquisition_value() -> list[Configuration]:
    assert n_points is not None
    # since maximize returns a tuple of acquisition value and configuration,
    # and we only need the configuration, we return the second element of the tuple
    # for each element in the list
    return [t[1] for t in self._maximize(previous_configs, n_points)]

It essentially returns a list of the configurations returned by the _maximize function of the according Acquisition Maximizer.

Next Step: Calling _Maximize

In local and random search (default),

random search._maximize is called with n_points=n_points, so 8 by default for blackbox optimization. If we look into the implementation of _maximize in random search, the n_points is both the number of configs sampled in _maximize and the number of returned points.

local search is called with n_points=self._local_search_iterations, which is 10 by default (also e.g. in the blackbox facade, would be 8 if used directly as the maximizer). In the local search, this is then used as the number of initial points to be used from which the search is iterated.

Finally, the configs are combined, sorted by acquisition function value, and returned.

Issue 3

In random search, only as many points are sampled as are required to be returned. The number of sampled points could easily be a much higher value, e.g. the suggested 1000 or the possibly intended default of 5000. The same is true for local search, the number of initial points can be greater than the number of actually needed points in the end, though it is probably less relevant here due to the iterations done.

Issue 4

In differential evolution, n_points is completely ignored

Issue 5

There are more configs generated in _maximize than required by the call, e.g. if retrain_after=8 and we use local and random search and a blackbox facade, 18 configs are generated by one call, but we retrain after 8.

Next Step: ChallengerList

A ChallengerList is created, which is given the function of next_configs_by_acquisition_value (creates a list of challengers each time it is called by calling _maximize)

challengers = ChallengerList(
    self._configspace,
    next_configs_by_acquisition_value,
    random_design,
)

The ChallengerList uses next. On each call of this, a Configuration is returned in the following way:

If random configurations should be interleaved, it is checked whether a random configuration instead of one returned by _maximize should be returned. Otherwise, the first time an optimized configuration is needed, _maximize is called, generating the list of optimized configurations

Issue 6

The Index is not increased for returned random configurations, so the number of returned configurations is the number returned by _maximize + the interleaved random configurations. The standard is the Modulus Random Design with modulus=2, so in the case of the blackboxfacade, 36 configurations are generated when only 8 are needed (only the first 8 will be used anyways).

Issue 7

One purpose of the extra function next_configs_by_acquisition_value seems to be to not sample all the next configs at once, however this is not practically working this way right now as it returns all configs at once. It would be nice if it worked that way though since the optimization may stop before all configs are returned

benjamc commented 5 months ago

Opinion Carolin

  1. API from maximize: n_points should be the number of next configurations to evaluate. That should equal retrain_after. --> rename n_points and adjust docstring
  2. API from _maximize: n_points should be the source points. This should not be configurable or passed, e.g. in random search it should be 1000 or 5000 --> rename n_points and adjust docstring and set sensible defaults
  3. After local/random search, all acq values should be (i) sorted and (ii) maybe interleaved with random configurations.
  4. Then, maximize should select the first retrain_after configs.
dengdifan commented 5 months ago

Opinion Difan

helegraf commented 5 months ago

Opinion Helena

I agree we probably do not need a number of returned points as an argument because, as Difan said, one can easily restrict the list to just the configurations one needs; the config selector even currently has an extra check for _retrain_after so that the surrogate is actually retrained after the correct amount of iterations. However, n_points should then very clearly be indicated in the documentation as being the number of sampled points, also function that way, and set to sensible values like 1000 or 5000 by default. The interleaving with random configurations is probably fine as of now, though a bit convoluted to change if you would want different variants (e.g., higher modulo) and a tad verbose with the extra function next_configs_by_acquisition_value.

Maybe we will also try to change it and find out why it was designed like this in the first place :D

helegraf commented 5 months ago

RandomAndLocalSearch: Ensure incumbent usage + 10 best of 5000 random (sorted)

helegraf commented 5 months ago

And retrain_after per default to 1 (possibly dynamic management so that 50/50 retraining cost and config search cost)

benjamc commented 5 months ago

see PR