Open AwePhD opened 3 weeks ago
Another addition for this multi fidelity settings. If I do not use max_cost_total
and use the simple max_evaluations_total
instead, PriorBand only evaluate at the max fidelity (200).
from pathlib import Path
from neps import IntegerParameter, FloatParameter, run
WORKDIR_PATH = "/tmp/debug_neps_fidelity"
MIN_EPOCH, MAX_EPOCH = 80, 200
TOTAL_EVALUATIONS = 20
pipeline_space = {
"epoch": IntegerParameter(lower=MIN_EPOCH, upper=MAX_EPOCH, is_fidelity=True),
"p1": IntegerParameter(lower=5, upper=15, default=10),
"p2": FloatParameter(lower=.5, upper=5, default=3),
}
def run_pipeline(epoch, p1, p2) -> dict | float:
loss = (p1 + p2) / epoch
return loss
def main():
run(
run_pipeline=run_pipeline,
pipeline_space=pipeline_space,
root_directory=Path(WORKDIR_PATH),
max_evaluations_total=TOTAL_EVALUATIONS,
)
if __name__ == '__main__':
main()
Once again, I probably misunderstand something about the choice of the fidelity. Maybe this behavior is OK.
@Neeratyoy
Hi @AwePhD Sorry for the late response and thanks for the reproducible example
Looking at the example and what you described, the behaviour seen is expected The short answer is that when MAX_BUDGET is 200, and ETA is 3 (by default), the first approximated fidelity is approx. 200 // 3 which is <MIN_BUDGET (=80) in your first case above. Hence, it defaults to just one rung, that is, the MAX_BUDGET.
You can use this function to approximately check what will happen with HyperBand budgets given your fidelity bounds:
import math
def check_budget_levels(MIN_EPOCH, MAX_EPOCH, ETA=3):
_min = MAX_EPOCH
counter = 0
fid_level = math.ceil(math.log(MAX_EPOCH / MIN_EPOCH) / math.log(ETA))
while _min >= MIN_EPOCH:
print(f"Level: {fid_level} -> {_min}")
_min = _min // ETA
counter += 1
fid_level -= 1
return
Do you think it will be a better interface if we raise a warning and breakout if there is only one budget level available (like your case 1 with 80,200) and we cannot really run HyperBand (multi-fidelity)?
Hi,
No problem for the delay at all.
Thanks for the function. I come from Vision/Language Deep Learning and I have little familiarity with HPO research, I read some papers though. I think it might be good to add in the documentation what are the value of the rungs based on ETA for multi-fidelity optimizer. Namely, what are the values of the rungs based on min and max value of the multi-fidelity parameter. Also, I think, the levels of fidelity should be logged somewhere in my opinion: in the console or in the .optimizer_info.yaml
- I know there is the eta
parameter inside it but, as you showed, it needs a bit* of math to get the level of multi-fidelity.
Also, to answer more directly, when a multi-fidelity parameter is passed and there is only one level of fidelity, I think a warning should be logged. Actually, when the user sets a parameter for multi-fidelity then there might be some chances that the user does not want the number of rungs to be equal to 1. I do not think a breakout is necessary.
*: I know that this computation is written in hyperband paper and probably in other papers using multi-fidelity. But, in my opinion, a practical user of NePS, not coming from the HPO community, should not read a paper to use the code. But, it's up to you, maybe you reasonably wish that the users of NePS have some knowledge of the HPO litterature. Plus, maybe some other multi-fidelity methods (will) have a different way to compute the number of rungs. IMO for a practical HPO user, the first thing accessible should be the number of rungs and their values, then a place in the documentation to see how they are computed based on eta and the fidelity values (or anything else).
Hi,
The multi-fidelity by PriorBand is not applied correctly, probably because I did not configure the Optimizer. I would like to know if it's normal or I misunderstood something about the multi-fidelity setup in NePS.
Here a Python script that reproduces the behaviour with neps 0.12.2:
When I run with various values for the boundaries I have different numbers of fidelity (it's called rungs, right?):
The first example is important for debugging when I take a tiny subset of my dataset for debugging purpose. I need a lot of epochs to overfit the model (+300M parameters) on few samples, before 80 epochs the model is not learning at all. It seems that the fidelity starts at 80 is a problem? Is it an expected ((multi-))fidelity strategy? Is it because the ratio for computing the number of rungs is
max_fidelity/min_fidelity
? (I read that from PriorBand paper if I remember well).Should I setup the
eta
parameter? By the way, I am happy about the second example, which represents an actual training on the whole dataset.