Azure / mlops-v2

Azure MLOps (v2) solution accelerators. Enterprise ready templates to deploy your machine learning models on the Azure Platform.
https://learn.microsoft.com/en-us/azure/machine-learning/concept-model-management-and-deployment
MIT License
502 stars 248 forks source link

Azure DevOps - Error Deploying Online Endpoint - ResourceNotReady: User container has crashed or terminated: Liveness probe failed: HTTP probe failed with statuscode: 502 #120

Closed bgawale closed 9 months ago

bgawale commented 9 months ago

Describe the bug or the issue that you are facing

Following the documentation here to Deploy Azure Machine Learning Model Deployment Pipeline using Azure DevOps- https://github.com/Azure/mlops-v2/blob/main/documentation/deployguides/deployguide_ado.md#deploy-azure-machine-learning-model-deployment-pipeline

Pipeline has been created with the template provided i.e. #deploy-batch-endpoint-pipeline and facing below error after running the pipeline

ERROR: (None) ResourceNotReady: User container has crashed or terminated: Liveness probe failed: HTTP probe failed with statuscode: 502.

While the endpoint appears to be created when looked in the Azure ML workspace, however the provisioning state shows the status as 'Failed.' Refer image below

image

Steps/Code to Reproduce

Follow the steps documented here - https://github.com/Azure/mlops-v2/blob/main/documentation/deployguides/deployguide_ado.md#deploy-azure-machine-learning-model-deployment-pipeline

After creating the pipeline, run it and error is generated.

Expected Output

Pipeline should run without an error and the model should be successfully deployed without any error shown in it's provisioning state.

Versions

Azure DevOps Bicep Azure ML CLI Tabular prebuilt example

Which platform are you using for deploying your infrastrucutre?

Azure DevOps (ADO)

If you mentioned Others, please mention which platformm are you using?

NA

What are you using for deploying your infrastrucutre?

Bicep

Are you using Azure ML CLI v2 or Azure ML Python SDK v2

Azure ML CLI v2

Describe the example that you are trying to run?

taxi-fare-regression

bgawale commented 9 months ago

Managed to fix it. Based on the logs, it appeared that below dependency was missing that was causing the termination of the container. azureml-inference-server-http

To update the environment dependencies, you would need to update the file below in the taxi-fare-regression repository /data-science/environment/train-conda.yml

channels:
  - defaults
  - anaconda
  - conda-forge
dependencies:
  - python=3.8.0
  - pip
  - pip:
      - azureml-mlflow==1.38.0
      - azureml-sdk==1.38.0
      - scikit-learn==0.24.1
      - pandas==1.2.1
      - joblib==1.0.0
      - matplotlib==3.3.3
      - azureml-inference-server-http
      - git+https://github.com/microsoft/AzureML-Observability#subdirectory=aml-obs-client
      - git+https://github.com/microsoft/AzureML-Observability#subdirectory=aml-obs-collector

Note - the missing dependency i.e. azureml-inference-server-http needs the updated python version > 1.37.0