automl / neps

Neural Pipeline Search (NePS): Helps deep learning experts find the best neural pipeline.
https://automl.github.io/neps/
Apache License 2.0
39 stars 11 forks source link

Error if I rerun an experiment that uses hyperband as a searcher #103

Closed danrgll closed 4 weeks ago

danrgll commented 1 month ago

Example to reproduce the error:

def run_pipeline(**config):
    epochs = config["epochs"]
    optimizer = config["optimizer"]
    eval_score = np.random.random(1)
    return {"loss": eval_score}

pipeline_space = dict(
    epochs=neps.IntegerParameter(lower=1, upper=10, is_fidelity=True),
    batch_size=neps.IntegerParameter(lower=32, upper=128, log=False),
    optimizer=neps.CategoricalParameter(choices=["sgd", "adam"])
)

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)
    # neps.run(run_args="test_yaml_test.yaml")
    neps.run(run_pipeline=run_pipeline, pipeline_space=pipeline_space,
             max_evaluations_total=20, root_directory="results")

Error message:

INFO:neps:Running hyperband as the searcher INFO:neps:Algorithm: hyperband Traceback (most recent call last): File "/Users/daniel/PycharmProjects/neps/wrapper_test.py", line 30, in neps.run(run_pipeline=run_pipeline, pipeline_space=pipeline_space, File "/Users/daniel/PycharmProjects/neps/neps/api.py", line 335, in run launch_runtime( File "/Users/daniel/PycharmProjects/neps/neps/runtime.py", line 883, in launch_runtime with shared_state.sync(lock=True): File "/Users/daniel/opt/miniconda3/envs/neps2/lib/python3.9/contextlib.py", line 119, in enter return next(self.gen) File "/Users/daniel/PycharmProjects/neps/neps/runtime.py", line 724, in sync self.update_from_disk() File "/Users/daniel/PycharmProjects/neps/neps/runtime.py", line 626, in update_from_disk previous_report = self.evaluated_trials[previous_config_id] KeyError: '10_1'

danrgll commented 1 month ago

The error message looks similar to error message of #104 but caused differently

eddiebergman commented 1 month ago

Just to add to this, it only seems to fail when re-running and the results are already present in the results folder, i.e. run twice in a row to get this error.

Neeratyoy commented 1 month ago

is it possible to see this example run from an older commit if the same error occurs? (since a lot of our experiments did have re-runs of Hyperband which seemed to be fine)

eddiebergman commented 1 month ago

I can reproduce, working on the fix