Open Neeratyoy opened 8 months ago
This happens when the process is force-killed during the evaluation of a config, and is reproducible with a single process.
To reproduce:
Random Search
run_pipeline(...)
function which takes a relatively long time compared to the algorithm overhead: e.g time.sleep(10)
neps.api.run
. Arguments don't matter this should reproduceStart evaluating config ...
. Otherwise, refine the steps 1 and 2 to increase your chance of terminating during evaluation.result.yaml
file, you have successfully interrupted an evaluation.Alternatively, You can skip the steps 1-5, and manually delete a result.yaml
file from any config folder to make NePs think that, there is a pending config some mysterious other process is handling right now.
For potential reproducibility of the observed issue:
max_evaluations_total
) evaluations distributed across 4 workersThe overall run ran fine but noticed certain anomalies, as described below,
16
21
was generated while config ID16
was not re-evaluated or completed and remainspending
foreverSome more observations:
max_evaluations_total=20
we should have config IDs from 1-20 with each of them having their ownresult.yaml
config_16
does not haveresult.yaml
whereasconfig_21
doesmax_evaluations_total=21
, it now satisfies that extra config required by sampling a new configconfig_22
Should a new worker, re-evaluate pending configs, as priority? Also with this issue or under this scenario the generated config IDs range from
[1, n+1]
ifmax_evaluations_total=n
.