Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.53k stars 2.76k forks source link

Unable to set seed of a sweep job #33703

Open jinlow opened 8 months ago

jinlow commented 8 months ago

Describe the bug I want to be able to change the seed of a sweep job, that is using random sampling. From looking at the source code/docs, I think I should be able to pass a azure.ai.ml.sweep.RandomSamplingAlgorithm object in for the sampling_algorithm parameter in the command.sweep method. However, this doesn't work, and instead results in this bug when I try and run the job.

azure.core.exceptions.HttpResponseError: (UserError) Error occurred when loading YAML file rootNode, details: Algorithm  is not supported by Sweep Component.

To Reproduce Run a sweep job, and then try setting random sampling algorithm, and a seed.

from azure.ai.ml.sweep import RandomSamplingAlgorithm

# ... Create command
sweep_job = command.sweep(goal="maximize", sampling_algorithm=RandomSamplingAlgorithm(seed=0, rule="random"))

Expected behavior I would expect to be able to pass the RandomSamplingAlgorithm(seed=0, rule="random") class, and it would just work.

github-actions[bot] commented 8 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

github-actions[bot] commented 8 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

amangupta26 commented 7 months ago

@jinlow Thanks for reporting this issue. We are looking into it.

chiragbhatt311 commented 7 months ago

I ran a sweep job with same settings as described above

Sweep Job completed successfully without any issue. @jinlow Can you please share some more details about the issue and try out this code, Thanks

from azure.ai.ml import MLClient
from azure.ai.ml import command, Input
from azure.ai.ml.sweep import Choice, Uniform, RandomSamplingAlgorithm
from azure.identity import InteractiveBrowserCredential

subscription_id = '<subscription-name>'
resource_group = '<resource-group-name>'
workspace_name = '<workspace-name>'
ml_client = MLClient(InteractiveBrowserCredential(), subscription_id, resource_group, workspace_name)

command_job_for_sweep = command(
    code="./",
    command="python test_script.py --x1 ${{inputs.x1}} --x2 ${{inputs.x2}}",
    environment="AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu@latest",
    inputs={
        #define the search space for your hyperparameters
        "x1": Uniform(min_value=0.01, max_value=0.9),
        "x2": Uniform(min_value=1, max_value=2)
    }
)

sweep_job = command_job_for_sweep.sweep(
    compute= '<compute-cluster-name>',
    sampling_algorithm= RandomSamplingAlgorithm(seed=0, rule="random"),
    primary_metric='<primary-metric-name>',
    goal="maximize",
)

# Define the limits for this sweep
sweep_job.set_limits(max_total_trials=2, max_concurrent_trials=2, timeout=7200)

# Specify your experiment details
sweep_job.display_name = '<display-name>'
sweep_job.experiment_name = '<experiment-name>'
sweep_job.description = '<description>'

returned_sweep_job = ml_client.create_or_update(sweep_job)
returned_sweep_job.services["Studio"].endpoint
jinlow commented 7 months ago

@amangupta26 @chiragbhatt311 The code you provided works, however, when I try and run a sweep job, inside of a pipline then it fails. Here is a runable example.

from azure.ai.ml import MLClient, dsl, command, Input
from azure.ai.ml.sweep import Uniform, RandomSamplingAlgorithm
from azure.identity import AzureCliCredential

ml_client = MLClient.from_config(AzureCliCredential())

command_job_for_sweep = command(
    command="python -c 'import mlflow;mlflow.log_metric(\"primary_metric\", ${{inputs.x1}})'",
    environment="AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu@latest",
    inputs={
        "x1": Input(type="number"),
    },
)

@dsl.pipeline()
def runable_pipeline():
    cmd = command_job_for_sweep(x1=Uniform(min_value=0.01, max_value=0.9))

    sweep_job = cmd.sweep(
        sampling_algorithm=RandomSamplingAlgorithm(seed=0, rule="random"),
        primary_metric="primary_metric",
        goal="maximize",
        max_total_trials=2,
        max_concurrent_trials=2,
    )

returned_sweep_job = ml_client.create_or_update(
    runable_pipeline(), compute="small-compute-cluster"
)
print(returned_sweep_job.studio_url)

I get the error...

Message: Algorithm  is not supported by Sweep Component.
Additional Information:Type: ComponentName
# Removed info being printed about our sub
Type: InnerError
Info: {
    "value": {
        "code": "BadArgument",
        "innerError": {
            "code": "ArgumentInvalid",
            "innerError": {
                "code": "InvalidComponent",
                "innerError": null
            }
        }
    }
}
chiragbhatt311 commented 7 months ago

I have reproduced the error. Thanks @jinlow for identifying it. We are investing the issue.

jinlow commented 7 months ago

@chiragbhatt311 Any update on this? We are getting some pushback on using pipelines in AzureML because this makes sweep jobs non-deterministic.

chiragbhatt311 commented 7 months ago

Hi @jinlow Currently hyperparameter sweep in pipeline does not support Algorithm object. This is a feature gap. https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-sweep-in-pipeline?view=azureml-api-2 For now string values can be provided for sampling_algorithm

jinlow commented 7 months ago

But how can I set the seed?

jinlow commented 7 months ago

@chiragbhatt311 settings the seed, is the part that I need. Otherwise these runs are non-deterministic, and not reproducible.

chiragbhatt311 commented 7 months ago

For pipeline sweep job, currently you won't be able to set seed for random algorithm. For deterministic runs, you can try grid sampling and pass the exact hyperparameters you want to reproduce

jinlow commented 7 months ago

Thanks @chiragbhatt311 , is there any timeline for when the ability to set a seed will be implemented?

jinlow commented 7 months ago

@chiragbhatt311 Curious if there is any timeline for this functionality?

gravesee commented 6 months ago

@eniac871 is there an update or workaround for this issue? It affects more than using a seed w/sampling algorithm. We also cannot use random sampling with sobol rule either. Thanks.