Closed Neeratyoy closed 5 months ago
Some preliminary investigation, where I reduced the search space to just tiny models and only run 8 configurations total:
As for the timing mismatch, I don't know why this is in particular for your case but it's quite noisy with configurations. For example, many layers and nuerons, if it gets repeated on all 4 workers, is going to be slower then 4 tiny configs on a single worker. Either way, workers are all getting to do work so I don't think there's a real MP slow down somewhere (other than syncing stages between workers).
Will do a bit more of a debug
Investigating the timing thing but I imagine it's some timing error because from running both, multiprocessing is definitely faster. Might also have been conflated with the comment above this.
As for the weird thing from the orange bar, that's an artifact of out of order evaluations returning. All configurations sampled had the correct hyperparameters in the correct order.
Timing manually, I got that 4 workers was faster but of course not 4x faster. Seems that a single worker was keeping two full cores busy with some sporadic bursts on my others cores. With 4 parallel workers, it kept all of my 8 cores full.
My guess is the bottleneck is something to do with data loading and it not being optimized for usage in parallel setting.
How the seed is set prior to
neps.run()
call and how then differentnep.run()
are spawned makes a difference in the seeding effect.Issue example
In this plot,
random_search
is a single worker run, while the other 2 are the same neps setting run differently to create workers. Both show vastly different behaviour.random_search_multiprocessing
shows an initial speedup which should be the case for early budgets in random search parallelization.Desired setting
Both the
random_search_*
lines should be exactly the same and provide early speedups overrandom_search
.Reproducibility steps
The following set of steps should lead to reproducing the issue:
To run the single worker baseline:
To run the same but with 4 workers:
Now running the same using multiprocessing:
NOTE: The only difference between
run_rs
,run_rs_nohup
,run_rs_multiprocessing
is the output path, everything else is the same, i.e., the same NePS run.To plot: