GrainLearning / grainLearning

A Bayesian uncertainty quantification toolbox for discrete and continuum numerical models of granular materials, developed by various projects of the University of Twente (NL), the Netherlands eScience Center (NL), University of Newcastle (AU), and Hiroshima University (JP).
https://grainlearning.readthedocs.io/
GNU General Public License v2.0
9 stars 1 forks source link

Automatically run a hyperparameter tuning #71

Open chyalexcheng opened 9 months ago

chyalexcheng commented 9 months ago

Running tutorials/data_driven/LSTM/hyperbola_calibration_mixed_hypertuning.py receives the following error.

400 response executing GraphQL. {"errors":[{"message":"400 Bad Request: The browser (or proxy) sent a request that this server could not understand.","path":["upsertSweep"]}],"data":{"upsertSweep":null}} wandb: ERROR Error while calling W&B API: 400 Bad Request: The browser (or proxy) sent a request that this server could not understand. (<Response [400]>)

Functions relevant to this are my_training_function, hyper_train(), and lines 112

luisaforozco commented 9 months ago

Hello, @chyalexcheng

To reproduce this error I had to make the following change: add entity_name="grainlearning-escience" in the definition of hyper_tuner. By default (i.e. in the class definition of train_rnn.HyperTuning) entity_name='grainlearning', but if the person is not in that team that is going to give an error. I suggest the default would be entity_name=''

Once I had that solved, I got the same error, here I put the complete stack-trace (of what is pertinent to us):

Bayesian calibration iter No. 0
/opt/homebrew/Caskroom/miniforge/base/envs/grainlearning/lib/python3.9/site-packages/seaborn/axisgrid.py:123: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
Bayesian calibration iter No. 1
wandb: Currently logged in as: luisaforozco. Use `wandb login --relogin` to force relogin
400 response executing GraphQL.
{"errors":[{"message":"400 Bad Request: The browser (or proxy) sent a request that this server could not understand.","path":["upsertSweep"]}],"data":{"upsertSweep":null}}
wandb: ERROR Error while calling W&B API: 400 Bad Request: The browser (or proxy) sent a request that this server could not understand. (<Response [400]>)
wandb: ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.

I browsed a bit but didn't find anything helpful. I tried out commenting out some of the elements in sweep_config, but still got the same error.

Digging deeper I found that the source of the error is in the section where adding 'parameters' to self.sweep_config, i.e. commenting out the for loop for adding those parameters allowed it to pass:

def get_sweep_id(self):
        """
        Returns the sweep_id of a sweep created with the configuration specified in sweep_config and search_space.
        """
        # add default parameters from my_config into sweep_config, use 'values' as the key
        self.sweep_config['parameters'] = {}
        #for key, value in self.other_config.items():
        #    self.sweep_config['parameters'].update({key: {'values': [value]}})
        # update sweep_config with the parameters to be searched and their distributions
        self.sweep_config['parameters'].update(self.search_space)
        # create the sweep
        sweep_id = wandb.sweep(self.sweep_config, entity=self.entity_name, project=self.project_name)
        self.sweep_id = sweep_id

most probably there is something wrong with adding those parameters (format). I just put here this for info and picking up later (have to run to a meeting now).

APJansen commented 9 months ago

I'm not sure but it could be the update, I would try creating the dictionary first in a separate variable and assigning it in one go to self.sweep_config['parameters'], something like

parameters = {}
for key, value in self.other_config.items():
    parameters.update({key: {'values': [value]}})
parameters.update(self.search_space)
self.sweep_config['parameters'] = parameters

If that doesn't fix it, there must be something wrong with one of the keys or values in this parameters dictionary, something wandb cannot deal with.

chyalexcheng commented 9 months ago

The problem was caused by three numpy arrays (input, params, and output) in self.other_config. It seems wandb cannot take them as part of the sweep configuration. However, another error occurs: somehow, these arrays get turned into strings when reloaded from get_best_run_from_sweep

best_run = sweep.best_run(order=order)
config = best_run.config

in module grainlearning.rnn.predict (line 46)

We can simply convert these strings back to arrays but this behavior is weird. Any idea why this is happening?