automl / neps

Neural Pipeline Search (NePS): Helps deep learning experts find the best neural pipeline.
https://automl.github.io/neps/
Apache License 2.0
61 stars 13 forks source link

refactor: BO and ifBO #134

Closed eddiebergman closed 1 month ago

eddiebergman commented 3 months ago

This PR simplifies and speedsup/improves BO and ifBO. By no means is this list of things exhaustive but it covers some of the major changes and new toys present.

How?


This was primarily done by only using the SearchSpace for its definitions, not it's methods. Interacting with models that expect tensors, we focus on encoding directly to a tensor, and acting directly on this encoded space, instead of going back and forth between SearchSpace and the data format that the surrogate models expect.

Before pass around list[SearchSpace] and each component encodes as needed, often performing operations directly on the SearchSpace.

After: Encode the list[SearchSpace] into what's required and inform the components about the encoding.

This buys a lot of time to perform better acquisition optimization, as well as avoids bloating the ever growing list of methods in SearchSpace, which can not provide a solution for every kind of model/optimizer we have.


As part of this, we now use botorch as a dependancy, which is primarily built on top of gpytorch, which we already depended on. Some of the benefits include:


Also, I have removed a lot of the fake flexibility that was offered for BO and ifBO. The primary thing removed is that our hand-rolled GP and the ftpfn model, are not attempted to be treated the same. They share very little in common, are acquired from in very different manners and have very different data-encodings. With the removal of DeepGP, these are our only two surrogates and we just treat them as two very different things. Maybe we try to unify in the future but I do not know what we gain from that.

In reality, we as developers would be the only one to use more advanced options and in general, they would be confusing to users actually looking to configure them, let alone the fact passing custom objects or even some of our own classes/objects would not work. Maybe we introduce the flexibilty at some point but it obfuscated the code, made it harder to maintain, test and debug. As an example, both ifBO and BO now only have one method ask() which contains most of the logic you would expect to see when referencing a paper/description of the algorithm.

Here is the ask() of both BO and ifBO now, which removed most of the abstractions and is now just direct function calls. It also removed the two step load_configs() and get_next_config() that we had before.

The result of this is that using the models is now "stateless", and mostly accessible through a function call.

ifBO is fairly similar in terms of the function calls.


As representing configurations as a complex object SearchSpace is highly innefficient for some of the model routines, such as encoding/decoding/sampling/acquisiton-function optimization, I avoid the use of the methods present in SearchSpace, and treat it as just a definition of hyperparameters. Instead, we define an encoding and encode all configuration information into one big tensor. The encoder can translate back and forth:

Conceptually, list[SearchSpace] <-> list[dict] <-> Encoder <-> Tensor

Doing so meant that we go from "asking the search space to sample itself and then do all transformations" to "Sample a tensor and do tensor operations to match the encoding". No objects, little python, just torch.

This required some new infrastructure that was aware of how configurations are encoded (ConfigEncoder).

This most important piece of new infrastructure is the Domain.

domain1 = Domain.floating(10, 10_000, log=True)
x_in_domain_1 = torch.tensor([10, 100, 1_000, 10_000])

domain2 = Domain.floating(0, 1, bins=18)
x_in_domain_2 = domain2.cast(x, frm=domain1)

Anywhere where we use a tensor, there is a Domain associated with it somehow. In summary, it contains information about what kind of numbers are in the tensor. We have them in quite a few places and put to good use:


The primary method is pretty straight forward. The most important argument is really to=, which lets you say "in what domain(s) would you like your samples?" This means you can sample a big tensor of uniform and convert it directly into the domain of encoded configs, (i.e. integers for categoricals, min-max normalized floats/ints, etc...).

def sample(
    self,
    n: int | torch.Size,
    *,
    to: Domain | list[Domain],
    seed: torch.Generator | None = None,
    device: torch.device | None = None,
    dtype: torch.dtype | None = None,
) -> torch.Tensor:
    """Sample `n` points and convert them to the given domain.

    Args:
        n: The number of points to sample. If a torch.Size, an additional dimension
            will be added with [`.ncols`][neps.samplers.Sampler.ncols].
            For example, if `n = 5`, the output will be `(5, ncols)`. If
            `n = (5, 3)`, the output will be `(5, 3, ncols)`.
        to: If a single domain, `.ncols` columns will be produced form that one
            domain. If a list of domains, then it must have the same length as the
            number of columns, with each column being in the corresponding domain.
        seed: The seed generator
        dtype: The dtype of the output tensor.
        device: The device to cast the samples to.

    Returns:
        A tensor of (n, ndim) points sampled cast to the given domain.
    """
    ...

Most of the Prior's are back by torch distributions, which has the aptly named TorchDistributionWithDomain, which encapsulates both a distribution and the domain over which it samples. The cast() method allows fluidly transforming between distribution domains, sample domains, and config encoding domains.


For some future work, I believe many of the bandit prior methods could benefit from the Prior class, as it allows calculating priors over both uniform parameters and those with a prespecified default.

eddiebergman commented 3 months ago

Lol tests pass, I guess non of our tests anywhere hit this because this definitely shouldn't work right now

karibbov commented 3 months ago

There are some minor updates on the mergeDyHPO branch which I'll go over with some comments to follow. I don't think NotPSDError needs any special addressing, it usually happens to rise when the model is fed bad data (as in many repeating or similar data points, perhaps even many zero values) so it should occur fairly seldom.

eddiebergman commented 2 months ago

Going to have a major time trying to rebase this onto the changes of #115 ... pray

eddiebergman commented 1 month ago

Adding tests to this PR before merging