automl / neps

Neural Pipeline Search (NePS): Helps deep learning experts find the best neural pipeline.
https://automl.github.io/neps/
Apache License 2.0
44 stars 11 forks source link

Check restarting/handling of pending config when resuming a run #30

Open Neeratyoy opened 8 months ago

Neeratyoy commented 8 months ago

For potential reproducibility of the observed issue:

Some more observations:

Should a new worker, re-evaluate pending configs, as priority? Also with this issue or under this scenario the generated config IDs range from [1, n+1] if max_evaluations_total=n.

karibbov commented 8 months ago

This happens when the process is force-killed during the evaluation of a config, and is reproducible with a single process.

To reproduce:

  1. Choose an algorithm which have very low overhead: e.g Random Search
  2. Write a run_pipeline(...) function which takes a relatively long time compared to the algorithm overhead: e.g time.sleep(10)
  3. Run neps.api.run. Arguments don't matter this should reproduce
  4. If the logs are observed terminate the process once the algorithm enters the evaluation phase with the log Start evaluating config .... Otherwise, refine the steps 1 and 2 to increase your chance of terminating during evaluation.
  5. If after termination there is a config with a missing result.yaml file, you have successfully interrupted an evaluation.
  6. Re-run the process to see the effect described.

Alternatively, You can skip the steps 1-5, and manually delete a result.yaml file from any config folder to make NePs think that, there is a pending config some mysterious other process is handling right now.