ARM-software / mango

Parallel Hyperparameter Tuning in Python
Apache License 2.0
335 stars 40 forks source link

Question about Serial scheduler and parallel scheduler #97

Closed zoechoutw closed 1 year ago

zoechoutw commented 1 year ago

Thank you for the amazing work! I have questions about the definition.

  1. I wonder if @Serial scheduler is to execute the original Bayesian optimization algorithm without parallelization?
  2. Could I assume that if I don't specify the scheduler, Serial scheduler is used by default?
  3. To perform Bayesian optimization in parallel, what are the definition of n_jobs and batch_size for? What happens if the numbers are different? ex. n_jobs=m and batch_size=n. Does it run m runs or n runs simultaneously?

Thank you very much!

tihom commented 1 year ago

@zoechoutw thanks for the questions. Here are the answers:

  1. I wonder if @Serial scheduler is to execute the original Bayesian optimization algorithm without parallelization?

Yes, it runs with batch_size=1 and evaluates the objective function one sample at a time.

  1. Could I assume that if I don't specify the scheduler, Serial scheduler is used by default?

No. In that case, the batch_size=1 by default but the user has to modify the objective function to accept a list of params and return a list of results. See the notebook here for a working example.

  1. To perform Bayesian optimization in parallel, what are the definition of n_jobs and batch_size for? What happens if the numbers are different? ex. n_jobs=m and batch_size=n. Does it run m runs or n runs simultaneously?

The batch_size is overwritten with the value provided for n_jobs when @scheduler.parallel(n_jobs=xx) is used.

zoechoutw commented 1 year ago

Thank you for your clear explaination!