Open StevePny opened 1 month ago
Hello, thank you for opening this issue. Here is another issue about adding certain parameters you want to be evaluated before the exploration phase https://github.com/facebook/Ax/issues/136
params1, trial_index1 = ax.attach_trial(parameters={"x1": 0.0}
params2, trial_index2 = ax.attach_trial(parameters={"x1": 0.0}
# run your evaluation here...
ax.complete_trial(trial_index1, [data here])
ax.complete_trial(trial_index2, [data here])
In your case, it is likely you need to call "complete_trial" in order for the custom arm to go from RUNNING to COMPLETED. Let me know if this helps your issue
Hi @mgrange1998, thanks for your reply. We're using something like the submitit tutorial, where there is a loop checking on the status of the submitted jobs. I was able to reproduce the problem there, so that would probably be the better place to focus the discussion.
Let's say then that after running ax_client.create_experiment(...)
I add to the submitit.ipynb tutorial notebook:
params_baseline, trial_index_baseline = ax_client.attach_trial(parameters={"x": 0.0, "y": 0.0})
In this case, trial_index_baseline=0
. In the tutorial it looks like the loop should cycle through jobs and complete them with
ax_client.complete_trial(trial_index=trial_index, raw_data=result)
in the code block below:
(For simplicity, I changed the num_parallel_jobs=1 and total_budget=4)
total_budget = 4
num_parallel_jobs = 1
jobs = []
submitted_jobs = 0
# Run until all the jobs have finished and our budget is used up.
while submitted_jobs < total_budget or jobs:
for job, trial_index in jobs[:]:
# Poll if any jobs completed
# Local and debug jobs don't run until .result() is called.
if job.done() or type(job) in [LocalJob, DebugJob]:
result = job.result()
ax_client.complete_trial(trial_index=trial_index, raw_data=result)
jobs.remove((job, trial_index))
# Schedule new jobs if there is availablity
trial_index_to_param, _ = ax_client.get_next_trials(
max_trials=min(num_parallel_jobs - len(jobs), total_budget - submitted_jobs))
for trial_index, parameters in trial_index_to_param.items():
job = executor.submit(evaluate, parameters)
submitted_jobs += 1
jobs.append((job, trial_index))
time.sleep(1)
# Display the current trials.
display(exp_to_df(ax_client.experiment))
# Sleep for a bit before checking the jobs again to avoid overloading the cluster.
# If you have a large number of jobs, consider adding a sleep statement in the job polling loop aswell.
time.sleep(30)
However, the attached job is never evaluated and so never marked 'complete'. Instead, it looks like the attached trial is submitted simultaneously with the first SOBOL-generated trial and then never actually runs.
It doesn't look like ax_client.get_next_trials()
actually gets the attached trial with the 0 index. Instead it jumps straight to trial 1 produced by the Sobol method:
Hi @mgrange1998, I found a solution that seems to work, though it is a bit of a hack:
First add the trials:
# SGP added: test attached trials
params_baseline, trial_index_baseline = ax_client.attach_trial(parameters={"x": 0.0,
"y": 0.0})
Our optimization loop is actually in a function, so requires accessing the experiment data from the ax_client before starting the while loop above:
def optimization_loop(ax_client, model_run_func, executor, evaluate, total_budget=10, num_parallel_jobs=1):
jobs = []
submitted_jobs = 0
# Run until all the jobs have finished and our budget is used up.
# Check if the ax_client already has manually added trials
if (ax_client.experiment.num_trials>0):
for trial_index in ax_client.experiment.trials:
trial = ax_client.experiment.trials[trial_index]
parameters = trial.arm.parameters
job = executor.submit(evaluate, parameters)
submitted_jobs += 1
jobs.append((job, trial_index_baseline))
while submitted_jobs < total_budget or jobs:
...
Then when calling:
ax_client = optimization_loop(ax_client, model_run_func, executor, evaluate_basic, total_budget=total_budget, num_parallel_jobs=num_parallel_jobs)
It appears to run correctly and does not continue beyond the first trial until that first manually submitted trial completes. I did not test this with multiple attached trials though.
Do you have a suggestion for my second question? "if there are offline runs that are not part of the optimization but, unlike the above question, we already have a "result" metric calculated, what is the recommended way to add this information?"
Hi,
I'm relatively new to Ax and looking for an answer to a particular use case - For context, we're using a custom implementation of the slurm API to optimize a large model running on AWS Parallel Cluster. We have some baseline parameterizations that we'd like to use to initialize the optimization.
To do this, I'd like to be able to use a user-defined generation strategy as in: https://ax.dev/tutorials/generation_strategy.html#1A.-Manually-configured-generation-strategy
via a 'Step 0' parameter set that uses a single 'status_quo' type experiment representative of our existing best tuned model.
I've tried, for example in a toy experiment, to manually override the experiment list using
ax_client.attach_trial(parameters={"rho":_rho,"beta":3.0,"sigma":_sigma})
, however during the optimization the training just says 'RUNNING' but never completes (I'm assuming because it is not part of the already-defined GenerationStrategy).So my primary question is: What is the preferred way to set up one or more pre-selected parameter sets for this kind of use-case, so that the pre-selected trials run first during the optimization?
A secondary question: Similarly, if there are offline runs that are not part of the optimization but, unlike the above question, we already have a "result" metric calculated, what is the recommended way to add this information?
e.g. this capability is offered in the https://smt.readthedocs.io/en/latest/ package via
xt
andyt
in the EGO routine below: