fix promotion bug - Githubissues

zhanglei1172 commented 2 years ago

In this code self.iteration_counter represents bracket counter, so direct promotion should only be done when the first bracket. I think also it would be better to be able to rename self.iteration_counter to self.bracket_counter.

Neeratyoy commented 2 years ago

Hi @zhanglei1172 Thanks for the interest in our work and the proactive feedback.

I'll address the 2 points raised by you below:

L476 should be < self.max_SH_iter since promotions (as designed in DEHB) could occur during the later brackets but within the first HB bracket or as written in the comments, the first set of SH brackets. The exact scenario where promotion ceases and evolution begins across all subpopulations depend on the exact setting of the min-max budget and eta. In DEHB, promotions happen until all subpopulations (including the highest fidelity) have all its population sizes filled with an evaluated configuration coming from a lower rung (or randomly sampled). However, promotions end after the first HB bracket or as we call it in the paper, the Initialization bracket.
self.iteration_counter was chosen to represent the main outer iteration in an unambiguous manner. Even in the paper we call each SH bracket as one iteration where the count is incremented continually. Since in our methods, we overload the term bracket as an SH bracket, an HB bracket or a DEHB bracket, it made sense to decouple the main loop as iteration to not have any confusion.

I would happy to hear from you if this answers your concerns. Alternatively, we could close the PR if this suffices. Cheers!

zhanglei1172 commented 2 years ago

Thanks a lot for your reply, but < self.max_SH_iter still confuses me.

I took a close look at the algorithm section in the DEHB paper. It is divided into iteration and bracket counter in the algorithm, and pomotion occurs when bracket counter ==0 is explicitly written in line 11 of the algorithm. I admit that self.iteration_counter in the repository represents the SH bracket. If self.iteration_counter== self.max_SH_iter, then a complete hyperband process has been completed. And all subpopulations have been evaluated after the first SH bracket(has different rungs) has been completed. Can you explain a little more about the difference between the algorithm and the code? thanks again.

Neeratyoy commented 2 years ago

Can you explain a little more about the difference between the algorithm and the code? thanks again.

Firstly, I myself had to look it up again and found myself going through the paper and reading Appendix C.1 at the end of Page 9. I think reading that along with the pseudo-code makes sense and it does what the code does. Now I must admit that the exact nomenclature might not have been carried forward to the code owing to implementation details. I hope you understand that.

I'll try my best to summarize briefly the parallels of the code and the pseudo-code (algo).

The bracket_counter in the algo counts the outer iterations or the main DEHB brackets. In the code self.iteration_counter instead counts the SH bracket numbers. That relates to the termination_condition check here. For self.iteration_counter, the increment happens at the same level as an SH bracket. This would be equivalent to putting bracket_counter += 1 in the inner scope at L21. However, for the algo, the bracket_counter represents a DEHB/HB bracket and bracket_counter is 0 represents the Initialization bracket.

In L7 i represents the rungs of the SH bracket. The way to read L11 would be that while in the Initialization bracket, for every SH bracket, we do promotions from the second rung onwards. Since the first rungs are either randomly sampled or obtained through vanilla-DE search.

To translate the same into the actual code, self.iteration_number < self.max_SH_iter is the accurate representation of being within the Initialization bracket.

I must mention again that the looping shown in the psueo code may not correspond exactly to the code. For the parallel design or for some optimization, our class design, functions, data structures will have certain variations with the pseudo-code. Even though, the pseudo-code is always a good valid first reference. I would recommend reading the paper and relating that to the code as it is the same logic.

Hope that I understood and addressed your concerns!

automl / DEHB

fix promotion bug #16