Azure / AML-Kubernetes

AzureML customer managed k8s compute samples
MIT License
80 stars 32 forks source link

"Cannot override existing endpoint: InferenceLiveness" error when customizing "inference_config.*.path" #251

Open joaocc opened 1 year ago

joaocc commented 1 year ago

Hi, We are trying to configure BYOC inference, as per https://github.com/Azure/AML-Kubernetes/blob/master/docs/inference-byoc.md#create-the-deployment.

deployment

...
environment:
  inference_config:
    liveness_route:
      port: 5001
      path: /
    readiness_route:
      port: 5001
      path: /
    scoring_route:
      port: 5001
      path: /score
...

Even when creating a new endpoint and a new deployment, we get the following error message

az ml online-deployment create -f ./my-byoc-deployment.yml --all-traffic -o json --resource-group MY_RG --subscription MY_SUBS_ID --workspace-name MY_AML_WS
All traffic will be set to deployment netcore-embed-1 once it has been provisioned.
If you interrupt this command or it times out while waiting for the provisioning, you can try to set all the traffic to this deployment later once its has been provisioned.
Check: endpoint MY_EP_NAME exists
(ValidationError) Cannot override existing endpoint: InferenceLiveness
Code: ValidationError
Message: Cannot override existing endpoint: InferenceLiveness
Exception Details:      (Invalid) Cannot override existing endpoint: InferenceLiveness
        Code: Invalid
        Message: Cannot override existing endpoint: InferenceLiveness
        Target: EnvironmentDefinition
Additional Information:Type: ComponentName
Info: {
    "value": "managementfrontend"
}Type: Correlation
Info: {
    "value": {
        "operation": "d7b2590abc52964b9c419daae732d145",
        "request": "24c44227a1dc42f3"
    }
}Type: Environment
Info: {
    "value": "westeurope"
}Type: Location
Info: {
    "value": "westeurope"
}Type: Time
Info: {
    "value": "2022-07-12T21:04:43.4605579+00:00"
}

If liveness_route and readiness_route are set to something else like /healtz or /api/check/healthz fails with same error.

Deployment succeeds with when liveness_route and readiness_route are set to /score.

azureml extensions v1.1.6 kubernetes v1.22.9 (eks)

% az --version        
azure-cli                         2.38.0
ml                                 2.5.0

Any ideas on where to find a documentation of restrictions related to this customization? Thx

Zhong-J commented 1 year ago

Hi Joao, if you are using AML base image for inference, inference config default settings can't be changed. If you are using your own docker image, for example, tf serving docker image, you can refer to this document for more details. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-custom-container

joaocc commented 1 year ago

Hi. We are indeed using a custom docker image, and it all works well when we have all 3 paths set to /score.

However when we try to change them as per examples above, the deployment starts failing with the errors above.

When comparing to the list you sent, the only major differences we found are:

However, even when commenting out liveness_probe.period and readiness_probe.period, we still got the same error message.

As a note, although the message says "Cannot override existing endpoint", it also happens if we are deploying an endpoint+deployment for the first time.

Thx