Azure / azure-cli-extensions

Public Repository for Extensions of Azure CLI.
https://docs.microsoft.com/en-us/cli/azure
MIT License
384 stars 1.25k forks source link

New error with 'az ml model deploy' in 1.0.81 #1175

Open adiun opened 4 years ago

adiun commented 4 years ago

Extension name (the extension in question)

"extensionType": "whl",
"name": "azure-cli-ml",
"version": "1.0.81"

Description of issue (in as much detail as possible)

When running the following model deployment command, with azure-cli-ml 1.0.81 this is now throwing an error. This did not throw an error in azure-cli-ml 1.0.79.

az ml model deploy -n quote-clf --inference-config-file src/inferconfig.json --auth-enabled true --deploy-config-file src/deployconfig_stab.json --compute-target aks-aml-stab -m quote_clf:3 -m quote_tfidf:3 --workspace-name aml-stab --resource-group aml-stab --overwrite -v

Error:

Creating image
Running.............................................................................
Succeeded
ERROR: {'Azure-cli-ml Version': '1.0.81', 'Error': AttributeError("'bool' object has no attribute 'name'",)}
Image creation operation finished for image quote-clf:17, operation "Succeeded"
##[error]Script failed with error: Error: The process '/bin/bash' failed with exit code 1

src/inferconfig.json:

{
    "condaFile": "environment.yaml",
    "enableGpu": false,
    "entryScript": "entry_script.py",
    "extraDockerfileSteps": "extra_dockerfile_steps",
    "runtime": "python",
    "sourceDirectory": "src"
}

src/deployconfig_stab.json:

{
    "autoScaler": {
        "autoscaleEnabled": true,
        "minReplicas": 2,
        "maxReplicas": 4,
        "refreshPeriodInSeconds": 1,
        "targetUtilization": 70
    },
    "authEnabled": true,
    "computeType": "aks",
    "containerResourceRequirements": {
        "cpu": 1,
        "memoryInGB": 1
    },
    "enableAppInsights": true,
    "maxConcurrentRequestsPerContainer": 3,
    "maxQueueWaitMs": 5000,
    "scoringTimeoutMs": 120000
}

src/environment.yaml:

name: quote
channels:
  - defaults
  - conda-forge
dependencies:
  - pip=19.2.2
  - scikit-learn=0.20.3
  - spacy=2.1.7
  - pip:
      - azureml-defaults>=1.0.60
      - https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz
rsethur commented 4 years ago

@adiun: I ran into the same error. As a intermediate fix use the yml format for inference config - it worked for me. Template for inference_config.yml: https://pastebin.com/CzX5pViy

gogowings commented 4 years ago

We have located the root cause and are working on releasing a fix.

gogowings commented 4 years ago

A related comment: the bug is only applicable in the deployment which doesn't use Azure Machine Learning Environment. We encourage the usage of Environment based deployment which is faster and more reliable.

gogowings commented 4 years ago

@adiun, in your case, using extraDockerfileSteps in InferenceConfig file makes the deployment goes to the non-Environment route. You can consider switch to Environment base deployment documented at: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#enable-docker

adiun commented 4 years ago

@rsethur Thanks! @gogowings Thanks for the updates and the tip about using extraDockerFileSteps. I received this guidance separately as well re. deployment performance so will definitely look into switching to that.