Closed eddiebergman closed 1 month ago
Lol tests pass, I guess non of our tests anywhere hit this because this definitely shouldn't work right now
There are some minor updates on the mergeDyHPO branch which I'll go over with some comments to follow. I don't think NotPSDError
needs any special addressing, it usually happens to rise when the model is fed bad data (as in many repeating or similar data points, perhaps even many zero values) so it should occur fairly seldom.
Going to have a major time trying to rebase this onto the changes of #115 ... pray
Adding tests to this PR before merging
This PR simplifies and speedsup/improves BO and ifBO. By no means is this list of things exhaustive but it covers some of the major changes and new toys present.
How?
This was primarily done by only using the
SearchSpace
for its definitions, not it's methods. Interacting with models that expect tensors, we focus on encoding directly to a tensor, and acting directly on this encoded space, instead of going back and forth betweenSearchSpace
and the data format that the surrogate models expect.Before pass around
list[SearchSpace]
and each component encodes as needed, often performing operations directly on theSearchSpace
.After: Encode the
list[SearchSpace]
into what's required and inform the components about the encoding.This buys a lot of time to perform better acquisition optimization, as well as avoids bloating the ever growing list of methods in
SearchSpace
, which can not provide a solution for every kind of model/optimizer we have.As part of this, we now use
botorch
as a dependancy, which is primarily built on top ofgpytorch
, which we already depended on. Some of the benefits include:class WeightedAcquisition
, which can take in a botorchAcquisitionFunction
and apply a custom weighting to the output. For example, here is a function that wraps an arbitrary acquisition function, and applies a weight based on the pdf of the samples under a prior, i.e. PiBo. You can see it in use hereAlso, I have removed a lot of the fake flexibility that was offered for BO and ifBO. The primary thing removed is that our hand-rolled
GP
and the ftpfn model, are not attempted to be treated the same. They share very little in common, are acquired from in very different manners and have very different data-encodings. With the removal of DeepGP, these are our only two surrogates and we just treat them as two very different things. Maybe we try to unify in the future but I do not know what we gain from that.In reality, we as developers would be the only one to use more advanced options and in general, they would be confusing to users actually looking to configure them, let alone the fact passing custom objects or even some of our own classes/objects would not work. Maybe we introduce the flexibilty at some point but it obfuscated the code, made it harder to maintain, test and debug. As an example, both ifBO and BO now only have one method
ask()
which contains most of the logic you would expect to see when referencing a paper/description of the algorithm.Here is the
ask()
of both BO and ifBO now, which removed most of the abstractions and is now just direct function calls. It also removed the two stepload_configs()
andget_next_config()
that we had before.The result of this is that using the models is now "stateless", and mostly accessible through a function call.
make_default_single_obj_gp(...)
encode_trials_for_gp(...)
fit_and_acquire_from_gp(...)
ifBO is fairly similar in terms of the function calls.
As representing configurations as a complex object
SearchSpace
is highly innefficient for some of the model routines, such as encoding/decoding/sampling/acquisiton-function optimization, I avoid the use of the methods present inSearchSpace
, and treat it as just a definition of hyperparameters. Instead, we define an encoding and encode all configuration information into one big tensor. The encoder can translate back and forth:Conceptually,
list[SearchSpace] <-> list[dict] <-> Encoder <-> Tensor
Doing so meant that we go from "asking the search space to sample itself and then do all transformations" to "Sample a tensor and do tensor operations to match the encoding". No objects, little python, just torch.
This required some new infrastructure that was aware of how configurations are encoded (
ConfigEncoder
).This most important piece of new infrastructure is the
Domain
.Domain
: A dataclass that represents a numeric range, dtype, whether it's binned, log scale etc... The most important method iscast()
, which allows you to convert between domains, e.g. cast fromDomain.floating(10, 10_000, log=True)
toDomain.floating(0, 1, bins=18)
Anywhere where we use a tensor, there is a
Domain
associated with it somehow. In summary, it contains information about what kind of numbers are in the tensor. We have them in quite a few places and put to good use:ConfigEncoder.domains: list[Domain]
, one domain for each column in the tensor representing encoded configs.XXXParameter
, gives information about the domain of parameter outputsTorchDistributionWithDomain
, as dumb as it sounds, it combines a torch distribution with the domain over which is has support/samples over.Domain | list[Domain]
into which you'd like those samples transformed.Sampler(Protocol)
: A Protocol for something that can sample tensors. Related, is also the protocolclass Prior(Sampler)
, which extends aSampler
by being able to also calculatelog_probs
a tensor of configs, used in things like pibo acquisition and prior based sampling. The mains once currently there are:Sobol(Sampler)
Uniform(Prior)
CenteredPrior(Prior)
, which handle parameters with and without defaults jointly.WeightedPrior(Prior)
, which allows you to combine multiple priors by weights.WeightedSampler(Sampler)
, which is the same but for samplers which are not prior enabled.BorderSampler(Sampler)
, which efficiently generates border configurations.The primary method is pretty straight forward. The most important argument is really
to=
, which lets you say "in what domain(s) would you like your samples?" This means you can sample a big tensor of uniform and convert it directly into the domain of encoded configs, (i.e. integers for categoricals, min-max normalized floats/ints, etc...).Most of the
Prior
's are back by torch distributions, which has the aptly namedTorchDistributionWithDomain
, which encapsulates both a distribution and the domain over which it samples. Thecast()
method allows fluidly transforming between distribution domains, sample domains, and config encoding domains.For some future work, I believe many of the bandit prior methods could benefit from the
Prior
class, as it allows calculating priors over both uniform parameters and those with a prespecifieddefault
.