Open graidl opened 1 week ago
Hi, thank you for your issue!
Could you maybe please give us a minimal working example to reproduce the problem that you are facing? From what i gather, this should be the line from the log
Regarding your note: you are absolutely right, and i will open a documentation issue #1165 to update our previous oversight.
looking into the issue a bit more, there are two reasons for why the config ended up in the ConfigSelector._processed_configs and thus is not new to it:
both of which will increase the counter of retries for pre-existing configurations.
if you only do have so few design decisions in your search space, why don't you just enumerate them. Are you investigating seeds or instances?
Thanks for your answers. Here is a small demo code that stops after about 20 evaluations instead of the indicated 200 trials:
from ConfigSpace import Configuration, ConfigurationSpace
from smac import AlgorithmConfigurationFacade, Scenario
import random
config_space = ConfigurationSpace({
"x": (1, 3),
"y": (1, 2),
})
instances = ["myinst.txt"]
features = {fn: [i] for (i, fn) in enumerate(instances)}
scenario = Scenario(config_space, deterministic=False,
instances=instances, instance_features=features,
n_trials=200)
def f(config: Configuration, instance: str, seed: int) -> float:
x = int(config["x"]); y = int(config["y"])
print(f'f({instance}, {seed}, {x}, {y})', end=" -> ")
res = x - y + random.random()
print(res)
return res
smac = AlgorithmConfigurationFacade(scenario, f, overwrite=True)
incumbent = smac.optimize()
print("Optimized configuration: ", incumbent)
The point is that the function `f`` for which the parameters should be optimized is not deterministic, which we also indicate in the scenario.
Yes, enumerating all possible configurations is feasible, in this simple example there are just 3*2=6. However, each of them needs to be evaluated a serious number of times to be able to decide that one is (statistically) better than another. SMAC seems to consider a configuration "fully processed" just after a very low number of evaluations with different seeds, maybe 3? This is clearly far to less for any statistical significance. I would have expected a parameter to set a higher number of seeds to be tried but did not find a solution to this.
Let me finally just explain that in our real application, we indeed also just want to optimize two discrete parameters although with larger domains. And doing a grid search by enumerating all configurations and doing e.g. 30 evaluations is what we wanted to avoid since each run takes quite some time.
Thanks for providing us with the minimal viable example. We will have a look into it. But from what you describe this might likely originate from the way the intensification works.
@mwever raised the issue, that for classic AlgorithmConfiguration, we usually assume a weak homogeneity; which makes it easier to estimate the performance and take action based on that more confidently. What you describe on the other hand basically implies that you have quite overlapping performance distributions and there is no clear winner unless we have a very good (expensively acquired on many seeds) performance estimate. Smac is intended to aggressively reject, but be gracious, should we re-encounter a configuration.
I have another conceptual issue, but first i want to verify that the seeding is not implicitly creating a new instance. Then we can move forward and assume that the seeds are independent draws from the performance distribution of that algorithm. If you then are racing algorithms against each other wrt the number of seeds, and say you used the mean of the performance distribution to make such a decision, then you are comparing estimators that have a varying amount of confidence about them I am not sure right now if that is already implemented/intended in that way in smac.
We intend to use SMAC to optimize only few discrete parameters with small domains of a stochastic metaheuristic using the
AlgorithmConfigurationFacade
. Our issue is that SMAC stops far too early.To be a bit more specific:
Let the config space bet
Thus, we have just 3*2=6 possible configurations. Moreover, let us use just one instance.
SMAC stops already after about 26 algorithm evaluations with the message
Could not return a new configuration after 16 retries
, despite we set n_trials to 2000 and deterministic to False, and the metaheuristic returned quite different values for the same configurations.Even though the search space for SMAC is very small, clearly a larger number of evaluations would be needed to conclude with some statistical significance that one configuration is best.
What am I missing?
AlgorithmConfigurationFacade
setsmax_config_calls
in the intensifier to 2000, so this cannot be the issue.Note also that the table in https://automl.github.io/SMAC3/main/3_getting_started.html says that the
AlgorithmConfigurationFacade
uses the Hyperband Intensifier, which, however I don't think is true.