Automatically run a hyperparameter tuning

chyalexcheng commented 5 months ago

Running tutorials/data_driven/LSTM/hyperbola_calibration_mixed_hypertuning.py receives the following error.

400 response executing GraphQL. {"errors":[{"message":"400 Bad Request: The browser (or proxy) sent a request that this server could not understand.","path":["upsertSweep"]}],"data":{"upsertSweep":null}} wandb: ERROR Error while calling W&B API: 400 Bad Request: The browser (or proxy) sent a request that this server could not understand. (<Response [400]>)

Functions relevant to this are my_training_function, hyper_train(), and lines 112

luisaforozco commented 5 months ago

Hello, @chyalexcheng

To reproduce this error I had to make the following change: add entity_name="grainlearning-escience" in the definition of hyper_tuner. By default (i.e. in the class definition of train_rnn.HyperTuning) entity_name='grainlearning', but if the person is not in that team that is going to give an error. I suggest the default would be entity_name=''

Once I had that solved, I got the same error, here I put the complete stack-trace (of what is pertinent to us):

Bayesian calibration iter No. 0
/opt/homebrew/Caskroom/miniforge/base/envs/grainlearning/lib/python3.9/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
Bayesian calibration iter No. 1
wandb: Currently logged in as: luisaforozco. Use `wandb login --relogin` to force relogin
400 response executing GraphQL.
{"errors":[{"message":"400 Bad Request: The browser (or proxy) sent a request that this server could not understand.","path":["upsertSweep"]}],"data":{"upsertSweep":null}}
wandb: ERROR Error while calling W&B API: 400 Bad Request: The browser (or proxy) sent a request that this server could not understand. (<Response [400]>)
wandb: ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.

I browsed a bit but didn't find anything helpful. I tried out commenting out some of the elements in sweep_config, but still got the same error.

Digging deeper I found that the source of the error is in the section where adding 'parameters' to self.sweep_config, i.e. commenting out the for loop for adding those parameters allowed it to pass:

def get_sweep_id(self):
        """
        Returns the sweep_id of a sweep created with the configuration specified in sweep_config and search_space.
        """
        # add default parameters from my_config into sweep_config, use 'values' as the key
        self.sweep_config['parameters'] = {}
        #for key, value in self.other_config.items():
        #    self.sweep_config['parameters'].update({key: {'values': [value]}})
        # update sweep_config with the parameters to be searched and their distributions
        self.sweep_config['parameters'].update(self.search_space)
        # create the sweep
        sweep_id = wandb.sweep(self.sweep_config, entity=self.entity_name, project=self.project_name)
        self.sweep_id = sweep_id

most probably there is something wrong with adding those parameters (format). I just put here this for info and picking up later (have to run to a meeting now).

APJansen commented 5 months ago

I'm not sure but it could be the update, I would try creating the dictionary first in a separate variable and assigning it in one go to self.sweep_config['parameters'], something like

parameters = {}
for key, value in self.other_config.items():
    parameters.update({key: {'values': [value]}})
parameters.update(self.search_space)
self.sweep_config['parameters'] = parameters

If that doesn't fix it, there must be something wrong with one of the keys or values in this parameters dictionary, something wandb cannot deal with.

chyalexcheng commented 5 months ago

The problem was caused by three numpy arrays (input, params, and output) in self.other_config. It seems wandb cannot take them as part of the sweep configuration. However, another error occurs: somehow, these arrays get turned into strings when reloaded from get_best_run_from_sweep

best_run = sweep.best_run(order=order)
config = best_run.config

in module grainlearning.rnn.predict (line 46)

We can simply convert these strings back to arrays but this behavior is weird. Any idea why this is happening?

GrainLearning / grainLearning

Automatically run a hyperparameter tuning #71