acerbilab / pybads

PyBADS: Bayesian Adaptive Direct Search optimization algorithm for model fitting in Python
https://acerbilab.github.io/pybads/
BSD 3-Clause "New" or "Revised" License
69 stars 2 forks source link

Paper review for JOSS submission #43

Closed jungtaekkim closed 1 year ago

jungtaekkim commented 1 year ago

I left a comment on your JOSS submission.

In the poll stage, points are evaluated on a mesh by taking steps in one direction at a time, until an improvement is found or all directions have been tried. The step size is doubled in case of success, halved otherwise. in the BADS repository,

In the poll stage, points are evaluated on a mesh by taking steps in one (non-orthogonal) direction at a time, until an improvement is found or all directions have been tried. The step size is doubled in case of success, halved otherwise. in your paper.

Also,

In the search stage, a Gaussian process (GP) is fit to a (local) subset of the points evaluated so far. Then, we iteratively choose points to evaluate according to a lower confidence bound strategy that trades off between exploration of uncertain regions (high GP uncertainty) and exploitation of promising solutions (low GP mean). in the BADS repository,

In the search stage, a Gaussian process (GP) surrogate model (Rasmussen & Williams, 2006) of the target function is fit to a local subset of the points evaluated so far. New points to evaluate are quickly chosen according to a lower confidence bound strategy that trades off between exploration of uncertain regions (high GP uncertainty) and exploitation of promising solutions (low GP mean) in your paper.

I think that this should be fixed, even if part of the list of the authors is overlapped.

Please take a look at my comment and potentially update your paper.

GurjeetSinghSangra commented 1 year ago

Dear reviewer,

Thank you for your detailed review of this project, it will be helpful for improving the existing work.
I would like to address some of your remarks.

  1. This paper mentioned that PyBADS or BADS is out of the Bayesian optimization field. For example, "BayesOpt requires specific technical knowledge to be implemented or tuned beyond simple tasks. I think that it is only applied for vanilla Bayesian optimization, and moreover BADS, which belongs to one of Bayesian optimization algorithms, tried to improve the vanilla Bayesian optimization, which implies that BADS is on the line of research on Bayesian optimization.

In the paper, we present PyBADS as a hybrid approach which combines Bayesian Optimization and Adaptive direct search methods. These references come just after having highlighted the potential issues of (classical) BayesOpt and variants of direct-search methods. See line 44: "PyBADS addresses all these problems as a fast hybrid algorithm that combines the strengths of BayesOpt and the Mesh Adaptive Direct Search method"; and line 83: "Differently to these algorithms, PyBADS comes with a unique hybrid, fast and robust combination of direct search (MADS) and Bayesian Optimization". In other words, we use BayesOpt to denote standard Bayesian Optimization. As a hybrid method, BADS belongs to BayesOpt as much as it belong to direct search. We can specify this better in the paper as follows: "However, classical BayesOpt requires specific technical knowledge to be implemented or tuned beyond simple tasks,..."

  1. Do you have any references or evidences for the function evaluation requires more than 0.1 seconds (BADS) and evaluation costs of hours or more (Bayesian optimization)? In my experience, that is not true at least for Bayesian optimization.

Just to clarify, these are recommended or typical usages. From a widely cited tutorial on Bayesian Optimization, Frazier (2018), the objective function typically has the property that "f is “expensive to evaluate” in the sense that the number of evaluations that may be performed is limited, typically to a few hundred. This limitation typically arises because each evaluation takes a substantial amount of time (typically hours), [...]".

We agree that shortcuts such as optimizing (not marginalizing) the GP hyperparameters; not retraining the GP at every iteration; using a simple acquisition function; can all reduce the cost -- together with the performance -- of vanilla BayesOpt. Nonetheless, even a minimal vanilla BayesOpt method left to run for several hundred or a few thousand iterations will eventually pay the cubic cost of GP training (Garnett 2022; Lan et al. 2022). Instead, PyBADS does not have this problem by construction, and can be easily used for optimization runs of many hundreds and even thousands evaluations with a fixed cost per function evaluation. To quantify our statement, we ran several benchmarks from D=2 to D=10, on both noisy and noiseless targets, and obtained an average algorithmic cost for PyBADS from 15 ms up to about 50 ms on a standard laptop computer (this is the cost of PyBADS averaged over the total number of function calls). Notably, by construction, PyBADS keeps this cost fixed and bounded as the number of evaluations increases. This means that, for target functions whose cost is >100 ms, the overhead given by using PyBADS is a (small) fraction of the total cost incurred by evaluating the target. Hence, our statement that PyBADS is recommended when the target has a mild computational cost of 100 ms or more

  1. Do you think if any acquisition functions like EI or PI can be used in PyBADS instead of LCB?

(Py)BADS can accomodate also other acquisition functions e.g EI or PI like in standard BayesOpt, but LCB is the default setting as it has been performing the best in the benchmark.

  1. I think that this project heavily relies on BADS. If I am not wrong, this is the re-implementation of BADS. The authors need to discuss this more thoroughly in the paper.

Thank you for highlighting this remark; we will emphasize its reference to the MATLAB code by changing:

  1. I think this issue is the most serious one. the Method section is copied and pasted from the BADS repository (the same sentences are in this repository, though). For example....

We believe there may be a misunderstanding. The content in question, derived from the BADS repository, has not been formally published as a paper, nor has it appeared in any prior publications. We have been responsible for creating this particular text since it serves as a suitable and accurate summary of the (Py)BADS Method's functions, and thus, rewriting it seems to be a redundant task. In the spirit of collaboration, we are open to paraphrasing the text if it is deemed necessary. However, we hold the opinion that this approach may not be the most productive use of our time and resources. Moreover, we would like to emphasize that this situation may appear incongruent with the fundamental principles of the Journal of Open Source Software (JOSS), where the primary emphasis is on documenting the software. It appears unusual that we are unable to use text that we meticulously crafted ourselves for documentation purposes.

References

jungtaekkim commented 1 year ago

@GurjeetSinghSangra Thank you for your reply. My concerns have been resolved. In particular, I appreciate the answer for the question Do you have any references or evidences for the function evaluation requires more than 0.1 seconds (BADS) and evaluation costs of hours or more (Bayesian optimization)?