Closed jackliu333 closed 2 years ago
This is a great question. Let's derive the given expression. Each iteration must:
The first two steps contribute a multiplicative factor to the running time of O(nq). The cost of each subsequent branch is then O(nq + q(nq + q(nq + ...))), with each instance of one-step lookahead contributing an additive contribution of O(nq) and each instance of averaging contributing a multiplicative contribution of O(q) leading into the subsequent iteration. Multiplying, we have O(nq(nq + q(nq + q(nq + ...)))), and if we keep only the highest-order term we get the claimed O(n²qᵀ) complexity.
I should add a footnote to this effect because it certainly isn't obvious.
Thank you for the illuminating explanation! Very helpful.
On the additive contribution part, is it because of the nature of the forward-looking rollout? Specifically, the original complexity is exponential in the lookahead horizon due to the forward-backward calculation in a DP formulation, while rollout only goes ahead with the current best one-step lookahead branch/candidate, thus truncating the other non-greedy branches and making it an additive contribution.
Is my understanding correct? I'm trying to work my head around the transition from multiplicative to additive.
The one-step lookahead in each stage of the proposed rollout scheme is a one-time event. Given (x, y), we first find x₂ via one-step lookahead, which takes time O(nq). At this point x₂ is fixed. Now we go on to estimate an expectation over y₂, which entails branching into the next stage. In pseudocode it would look something like:
[given x, y]
x₂ ← one-step-lookahead(D, x, y) // O(nq) one-time work
for y₂ in {...}: // contributes O(q) multiplicative factor
x₃ ← one-step-lookahead(D, x, y, x₂, y₂) // O(nq) one-time work
for y₃ in {...}: // contributes O(q) multiplicative factor
... // ...
In the full dynamic programming approach, selecting x₂ would require recursively branching on y₂, x₃, y₃, etc. But here we only consider y₂ in the design of x₂ by solving a suboptimal subproblem that assumes we immediately terminate after observing (x₂, y₂) (and (x, y)).
(Also I should stress this is not the only possible rollout scheme, just a simple, common, and often reasonably effective one.)
Thank you for the detailed reply! I think adding these details in the rollout section of the book would be very helpful for other interested readers, given it is a short but important section.
Based on my observation after reading some papers on using rollout BO, it seems to be traced back to (bertsekas 1997), with properties such as sequential consistency and improvement for a properly selected base/heuristic policy (e.g., one-step-lookahead function in your pseudocode).
The rollout variants in BO, other than the limited lookahead and batch mode mentioned in the rollout section of the book, seem to lie in different base policies such as rollout with constraints, more than one step lookahead, fortified rollout (using multiple base policies and ensemble), or even a different "one-shot" optimization approach across all follow-up sampling locations as in the BoTorch framework. Adding these topics will likely be a separate chapter on its own, but just a suggestion (and vote) if you decide to expand on this in future work:)
Describe the erratum The rollout time complexity discussed in the rollout section on page 102 is O((n^2)(q^tau)). I did not quite understand the n^2 part. Since the only location-wise optimization is the first x and all follow-up locations are directly obtained by the one-step lookahead base policy, shouldn't the time complexity for this part be n?
Version and Location Please provide:
Proposed correction Please explain the n^2 portion in the time complexity of rollout.