giuseppegrieco / keras-tuner-cv

Extension for keras tuner that adds a set of classes to implement cross validation techniques.
GNU General Public License v3.0
5 stars 5 forks source link

Hyperband tuner cannot load model weights of previous rounds #9

Closed VZoche-Golob closed 1 year ago

VZoche-Golob commented 1 year ago

In its second round, the Hyperband tuner loads model weights from the first round to continue the training. With keras-tuner-cv, this does not work:

Search: Running Trial #35

Value             |Best Value So Far |Hyperparameter
0.02              |0.02              |l2_regularization
0.99              |0.99              |bn_momentum
0.2               |0.2               |dropout_ti
0.2               |0.2               |dropout_context
0.2               |0.2               |dropout_trunk
cnn               |cnn               |layertype_ti
5                 |5                 |n_dim_emb_vvvo
0.01              |0.01              |lr
nadam             |nadam             |optimizer_type
2048              |2048              |batch_size
causal            |causal            |cnn_padding
20                |20                |cnn_filters
2                 |2                 |cnn_layers
9                 |3                 |tuner/epochs
3                 |0                 |tuner/initial_epoch
3                 |3                 |tuner/bracket
1                 |0                 |tuner/round
0025              |None              |tuner/trial_id

2023-09-04 16:01:33.885857: W tensorflow/core/util/tensor_slice_reader.cc:97] Could not open ./log/devpipeline_20230904144942/kt_log/devpipeline_20230904144942/trial_0025/checkpoint: DATA_LOSS: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
Traceback (most recent call last):
  File "/home/vzg/miniconda3/envs/tf/lib/python3.9/site-packages/keras_tuner_cv/inner_cv.py", line 119, in _try_run_and_update_trial
    self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
  File "/home/vzg/miniconda3/envs/tf/lib/python3.9/site-packages/keras_tuner_cv/inner_cv.py", line 84, in _run_and_update_trial
    results = self.run_trial(trial, *fit_args, **fit_kwargs)
  File "/home/vzg/miniconda3/envs/tf/lib/python3.9/site-packages/keras_tuner_cv/inner_cv.py", line 228, in run_trial
    history, model = self._build_and_fit_model(
  File "/home/vzg/miniconda3/envs/tf/lib/python3.9/site-packages/keras_tuner_cv/inner_cv.py", line 348, in _build_and_fit_model
    model = self._try_build(hp)
  File "/home/vzg/miniconda3/envs/tf/lib/python3.9/site-packages/keras_tuner/engine/tuner.py", line 155, in _try_build
    model = self._build_hypermodel(hp)
  File "/home/vzg/miniconda3/envs/tf/lib/python3.9/site-packages/keras_tuner/tuners/hyperband.py", line 432, in _build_hypermodel
    model.load_weights(self._get_checkpoint_fname(trial_id))
  File "/home/vzg/miniconda3/envs/tf/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/vzg/miniconda3/envs/tf/lib/python3.9/site-packages/h5py/_hl/files.py", line 567, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
  File "/home/vzg/miniconda3/envs/tf/lib/python3.9/site-packages/h5py/_hl/files.py", line 231, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 106, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
Trial 35 Complete [00h 00m 01s]

Best val_mae So Far: 1.2140199661254882
Total elapsed time: 01h 11m 47s

Other tuners do not load weights during HPO, therefore, this issue is specific to Hyperband. In tests with small hyperparameter spaces, this problem probably does not come up because Hyperband stops after the first round (see keras-tuner issue 676.

With keras-tuner-cv, the weights of a specific split model would have to be loaded. At the moment, Hyperband._build_hypermodel() (v1.3.5) looks for the saved weights in the trial directory (trial_xxxx) expecting a single set of weights (with the default file names?) (see https://github.com/keras-team/keras-tuner/blob/v1.3.5/keras_tuner/tuners/hyperband.py#L432) - but in the directory are several of them.