automl / neps

Neural Pipeline Search (NePS): Helps deep learning experts find the best neural pipeline.
https://automl.github.io/neps/
Apache License 2.0
39 stars 11 forks source link

refactor(runtime): Migrate metahyper to `neps.runtime` #71

Closed eddiebergman closed 2 months ago

eddiebergman commented 2 months ago

This is a semi-large PR but the main change was to integrate meta-hyper to just be part of neps. As such, I renamed it to just be neps/runtime.py.

Here's a synopsis that included at the top of the file:


"""Module for the runtime of a single instance of NePS running.
An important advantage of NePS with a running instance per worker and no
multiprocessing is that we can reliably use globals to store information such
as the currently runnig configuraiton, without interfering with other
workers which have launched.
This allows us to have a global `Trial` object which can be accessed
using `import neps.runtime; neps.get_in_progress_trial()`.

---

This module primarily handles the worker loop where important concepts are:
* **State**: The state of optimization is all of the configurations, their results and
 the current state of the optimizer.
* **Shared State**: Whenever a worker wishes to read or write any state, they will _lock_ the
 shared state, declaring themselves as operating on it. At this point, no other worker can
 access the shared state.
* **Optimizer Hydration**: This is the process through which an optimzier instance is _hydrated_
 with the Shared State so it can make a decision, i.e. for sampling. Equally we _serialize_
 the optimizer when writing it back to Shared State
* **Trial Lock**: When evaluating a configuration, a worker must _lock_ it to declared itself
 as evaluating it. This communicates to other workers that this configuration is in progress.

### Loop
We mark lines with `+` as the worker having locked the Shared State and `~` as the worker
having locked the Trial. The trial lock `~` is allowed to fail, in which case all steps
with a `~` are skipped and the loop continues.

1. + Check exit conditions
2. + Hydrate the optimizer
3. + Sample a new Trial
3. Unlock the Shared State
4. ~ Obtain a Trial Lock
5. ~ Set the global trial for this work to the current trial
6. ~ Evaluate the trial
7. ~+ Lock the shared state
8. ~+ Write the results of the config to disk
9. ~+ Update the optimizer if required (used budget for evaluating trial)
10. ~ Unlock the shared state
11. Unlock Trial Lock
"""

Some major points about the runtime.py:

While doing so there are quite a few minor changes made other than code structure: