[Q] Using SMAC as a per-instance labelling tool

We are trying to use SMAC in the Algorithm Configuration setting, with the goal to find a good configuration per instance that reduces its running time.

To achieve the same, we run SMAC over a dataset of 1000 instances for 12 hours and use the best incumbent configuration to warm-start the instance-specific run. We would later like to use the instance-specific configurations as a label for other ML tasks. However, the instance-specific runs are hardly able to improve upon the incumbent configuration. We checked the run history and found that almost all of the runs (~1 min) are getting capped based on the running time of the incumbent configuration until the wall clock limit (20 mins) is hit. Based on this, we think SMAC is not able to build a good surrogate model to approximate the running time of a configuration.

We are currently working on the following items to resolve this:

Increase intens_adaptive_capping_slackfactork to avoid the run being capped
Increase the per-instance wall clock limit
Increase rand_prob to promote more exploration

Do you have any suggestions to improve the per-instance labeling or the way in which we are using SMAC? I was also wondering if it is possible to modify the seed/rand_prob using the callbacks, in between iterations, to better guide the search.

Thank you in advance.

automl / SMAC3

[Q] Using SMAC as a per-instance labelling tool #926