Open TheEimer opened 3 months ago
Some results on lcbench-3945. This is 10 seeds per degree of parallelization with some logging bug at the start. Not 100% sure if this is expected (the difference is pretty small imo) and if lcbench is the best thing to look at in yahpo - you said MF, but to me it looks like there is no one setting, just loads of different variations in yahpo gym? Anything specific which would be interesting to look at here? (If this is actually interesting we can also close this and talk a bit offline lol)
To chime in, one alternative is to fantasize the result of configurations which have already been asked for, and then retrain the model with that result. It's a bit slow as it requires retraining just to get the prediction for one point but in practice it prevents duplicates configurations.
In NePS, we moved to botorch and are now using this acquisition function for BO: https://botorch.org/api/_modules/botorch/acquisition/logei.html#qLogNoisyExpectedImprovement
I believe it handles the pending configs during the acquisition itself, saving a retraining. You could theoretically look there if it's an issue.
I just saw that I mistakenly maximized because I wasn't aware that CARPS reversed the sign internally. To check if this is a hypersweeper issue, I reran and also added a non-hypersweeper SMAC ask-tell run (vanilla). Looks pretty much the same (not surprising since the hypersweeper only calls ask-tell and doesn't do anything to the results). So from that I'd conclude that a) this is a SMAC problem, likely with ask-tell and b) that something is definitely broken in here, these results say SMAC is about as good with running everything upfront (aka random search) as being run completely sequentially (though anytime performance is still broken due to logging)
I think this might be caused by #1148 though I suggest also looking at the non-MF case to make sure that at least functions as intended.
Small add-on: seeding was slightly different in that curve, I aligned it locally and now all curves look exactly the same :/s This is 2.0.2 btw
Okay, maybe this is actually not related to #1148 after all, if I add the standard target function evaluation to the comparison instead of using ask-tell, the curve matches too. So I guess either this is a bad test setting (can you recommend a better one where you know SMAC would be better than RS?), parallelism doesn't do anything to SMAC(???) or there is a larger bug somewhere.
Using lcbench 7593, I now get different results for TF execution and ask-tell (no hypersweeper, directly parsed from SMAC logs) for ask-tell vs TF execution. Something is definitely not working as intended in ask tell...
I stuck close to the ask-tell example, here's my code:
import hydra
from carps.utils.running import make_problem
from smac.facade.multi_fidelity_facade import MultiFidelityFacade
from smac.runhistory.dataclasses import TrialValue
from carps.utils.trials import TrialInfo
from smac import Scenario
from pathlib import Path
import json
@hydra.main(config_path=".", config_name="config_vanilla_smac.yaml")
def run_carps(cfg):
problem = make_problem(cfg=cfg.problem)
# Scenario object
scenario = Scenario(problem.configspace, deterministic=False, n_trials=126, min_budget=1, max_budget=52, seed=cfg.seed)
intensifier = MultiFidelityFacade.get_intensifier(
scenario,
eta=cfg.eta
)
def dummy(config, seed, budget, **kwargs):
return 0.0
# Now we use SMAC to find the best hyperparameters
smac = MultiFidelityFacade(
scenario,
dummy,
intensifier=intensifier,
overwrite=True,
)
incumbent_config = {}
incumbent_score = 100000
budget_used = 0
# We can ask SMAC which trials should be evaluated next
for _ in range(126):
info = smac.ask()
trial_info = TrialInfo(info.config, seed=cfg.seed, budget=info.budget)
cost = problem.evaluate(trial_info)
value = TrialValue(cost=cost.cost, time=0.5)
budget_used += info.budget
smac.tell(info, value)
if -cost.cost > -incumbent_score:
incumbent_score = cost.cost
incumbent_config = info.config.get_dictionary()
log_dict = {}
log_dict["config"] = incumbent_config
log_dict["score"] = incumbent_score
log_dict["budget_used"] = budget_used
with Path("incumbent.jsonl").open("a") as f:
json.dump(log_dict, f)
f.write("\n")
if __name__ == "__main__":
run_carps()
Should I make a separate issue here?
Using the ask-tell interface, we can in principle execute multiple configurations in parallel. Do you have an intuition if there are "safe" ways of doing this (e.g. ask for all configurations of a bracket and then tell them before switching brackets or doing the initial design fully and then limit parallelism to 10% of total trials at a time)? Do you expect a strong effect s.t. it would make sense to benchmark this or do you think this is probably not very relevant?
(I can do the benchmarking myself if you point me to a few good problems for it)