ARM-software / mango

Parallel Hyperparameter Tuning in Python
Apache License 2.0
335 stars 40 forks source link

Does it support multivariate distributions like scipy.stats.dirichlet in the search space? #115

Closed carusyte closed 1 week ago

carusyte commented 1 month ago

For example, I'm trying to use drichlet multivariate continuous distribution from scipy.stats to generate an array of random floats as one of my hyper parameters up for fine-tuning. However, if I preload historically searched hyper-parameters and pass to Tuner(initial_custom=searched_hp), the following error will be thrown:

Traceback (most recent call last):

  .... (obmited for brevity) ...

  File "/home/kemove/git/marten/src/marten/models/hp_search.py", line 826, in _bayesopt_run
    results = tuner.minimize()
              ^^^^^^^^^^^^^^^^
  File "/home/kemove/.pyenv/versions/3.12.2/lib/python3.12/site-packages/mango/tuner.py", line 160, in minimize
    return self.run()
           ^^^^^^^^^^
  File "/home/kemove/.pyenv/versions/3.12.2/lib/python3.12/site-packages/mango/tuner.py", line 147, in run
    self.results = self.runBayesianOptimizer()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kemove/.pyenv/versions/3.12.2/lib/python3.12/site-packages/mango/tuner.py", line 213, in runBayesianOptimizer
    X_init = self.ds.convert_GP_space(X_list)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kemove/.pyenv/versions/3.12.2/lib/python3.12/site-packages/mango/domain/domain_space.py", line 154, in convert_GP_space
    X = np.array(X)
        ^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (13, 30) + inhomogeneous part
tihom commented 1 month ago

multivariate distributions are not supported currently but it is something we will work to add in next few days. Would be useful if you could share the use case in more detail with a minimal example of the search space.

carusyte commented 1 month ago

So excited to learn that this feature is about to be available so soon!

I'm currently experimenting a few time series models that can handle multi-variates. One of the key areas in such field is that, to my limited knowledge at this moment, the choice of significant covariates that can increase the accuracy of prediction model. Although there're wealth of means to help us make better choices, I was thinking and trying to figure out whether I could treat the selection process as hyper-parameter search and leverage Bayesian optimization for that.

I could be too naive but, basically besides "ordinary" hyper-parameters in the search space, I'm adding an np.array of floats generated by dirichlet to rank the top-k covariates, randomly of course at the beginning. I'm expecting that bayeopt can somehow learn the relationship of this random distribution and the validation loss progressively and making better distributions that can reduce the loss, thus choosing better covariates...

This is how I specify the search space in my code:

dict(
    seq_len=range(5, 300+1),
    d_model=[2**w for w in range(6, 9+1)],
    d_core=[2**w for w in range(5, 10+1)],
    d_ff=[2**w for w in range(6, 10+1)],
    e_layers=range(4, 32+1),
    learning_rate=loguniform(0.0001, 0.002),
    lradj=["type1", "type2", "constant", "cosine"],
    patience=range(3, 10+1),
    batch_size=[2**w for w in range(5, 8+1)],
    dropout=uniform(0, 0.5),
    activation=["relu","gelu","relu6","elu","selu","celu","leaky_relu","prelu","rrelu","glu"],
    use_norm=[True, False],
    optimizer=["Adam", "AdamW", "SGD"],
    topk_covar=list(range(0, {max_covars}+1)),
    covar_dist=dirichlet([1.0]*{max_covars}),
)

https://github.com/agux/marten/blob/84ba0d486b0514f00f3defd46d5c3e4bdf878e73/src/marten/models/worker_func.py#L1746-L1762

And just to get things going, I made a few change as stop gap:

image

https://github.com/carusyte/mango/commit/bf2386c0d331a352ea3579b16eae984ba3eb0f92

tihom commented 1 week ago

closed by #117