[Paper-Cut: SDK] Estimator class should be able to use a Curated Environment in addition to a Custom Environment

CESARDELATORRE commented 4 years ago

Afaik, the Estimator class needs to always use a Custom Environment, probably because it always creates a Docker Image under the covers?

However, for simplicity's sake, it should be able to use a curated Environment (like you actually can do with the ScriptRunConfig class).

If you try to use a curated environment you get this error:

Error: "Environment name can not start with the prefix AzureML. To alter a curated environment first create a copy of it."

Therefore, you need to copy/clone a curated environment first, which is also not straightforward and needs the following code:

# Get the curated environment
curated_environment = Environment.get(workspace=ws, name="AzureML-Tutorial")

# Save curated environment definition to folder (Two files, one for conda_dependencies.yml and another file for azureml_environment.json)
curated_environment.save_to_directory(path="./curated_environment_definition", overwrite=True)

# Create custom Environment from Conda specification file
custom_environment = Environment.from_conda_specification(name="custom-workshop-environment", file_path="./curated_environment_definition/conda_dependencies.yml")

# Save curated environment definition to folder (Two files, one for conda_dependencies.yml and another file for azureml_environment.json)
custom_environment.save_to_directory(path="./custom_environment_definition", overwrite=True)

custom_environment.register(ws) 

estimator = Estimator(source_directory=project_folder, 
                      script_params=script_params,
                      compute_target=cluster,
                      # use_docker=True, #AML Cluster only supports Docker runs
                      entry_script='train.py',
                      environment_definition= custom_environment,
                      inputs=[ws.datasets['IBM-Employee-Attrition'].as_named_input('attrition')]

If the Estimator could use a curated environment like you can do it with the ScriptRunConfig class, you would simply need the following code:

curated_environment = Environment.get(workspace=ws, name="AzureML-Tutorial")

estimator = Estimator(source_directory=project_folder, 
                      script_params=script_params,
                      compute_target=cluster,
                      # use_docker=True, #AML Cluster only supports Docker runs
                      entry_script='train.py',
                      environment_definition= curated_environment,
                      inputs=[ws.datasets['IBM-Employee-Attrition'].as_named_input('attrition')]

danielsc commented 4 years ago

this is clearly a regression -- filed bug here: https://msdata.visualstudio.com/Vienna/_workitems/edit/583736

diondrapeck commented 4 years ago

@CESARDELATORRE - Thank you for bringing this to our attention. Yes, the ideal behavior will be for Estimator constructor to take in a curated environment via the environment_definition param.

For now, however, its functionality is the same as ScriptRunConfig; the curated environment can be directly assigned (without need for registration) only after the Estimator object has been created.

ScriptRunConfig example:

from azureml.core import ScriptRunConfig
from azureml.core.environment import Environment

# Get the curated environment
curated_environment = Environment.get(workspace=ws, name="AzureML-Tutorial")

runconfig = ScriptRunConfig(source_directory=".", script="train.py")
runconfig.run_config.target = "local"

# Attach environment to run config
runconfig.run_config.environment = myenv

Estimator example:

from azureml.train.estimator import Estimator
from azureml.core.environment import Environment

# Get the curated environment
curated_environment = Environment.get(workspace=ws, name="AzureML-Tutorial")

estimator = Estimator(source_directory=project_folder,
                    compute_target=compute_target,
                    entry_script='script.py')

# Attach environment to estimator run config
estimator.run_config.environment = curated_environment

danielsc / azureml-workshop-2019

[Paper-Cut: SDK] Estimator class should be able to use a Curated Environment in addition to a Custom Environment #27