Azure / AML-Kubernetes

AzureML customer managed k8s compute samples
MIT License
80 stars 32 forks source link

Change scoring_uri for ClusterIP inferenceRouterServiceType #282

Open jroskens-mgm opened 1 year ago

jroskens-mgm commented 1 year ago

When the ML extension is installed with ClusterIP set for inferenceRouterServiceType, the OnlineEndpoint seems to be building the scoring URI with the azureml-fe service's cluster IP, which is completely inaccessible outside the cluster if you use kubenet. This breaks the "Test" functionality on an Azure ML endpoint and the "az ml online-endpoint invoke" command, both of which try to use that internal ClusterIP.

OnlineEndpoint status

status:
  ...

  scoringUri: http://10.0.227.236/api/v1/endpoint/dev-rec-ab/score

Azure CLI Invoke and Error

$ az ml online-endpoint invoke --name  <endpoint-name> \
                        --resource-group <workspace-rg> \
                        --workspace-name <workspace-name> \
                        --deployment-name "green" \
                        --request-file "sample_request.json"

cli.azure.cli.core.azclierror: (<urllib3.connection.HTTPConnection object at 0x7fc1dfebbee0>, 'Connection to 10.0.227.236 timed out. (connect timeout=300)')

Azure ML Endpoint Details image

Is there not any way to set or override this URI? I dug through the documentation, but the only relevant text I could find in regards to cluster IP and nodeport configurations are here Key considerations for AzureML extension deployment:

I've configured an Istio ingress controller to pass external requests along to the azureml-fe service as my own load balancing solution, and that is working fine.

The rest of the documentation is written specifically for the LoadBalancer type, ignoring these other options. I tried setting a "scoring_uri" parameter in my endpoint.yaml, but it was ignored. I also dug through the configmaps in the azureml namespace, but saw no reference of it. I assume it is pulling it from the azureml/azureml-fe service directly. Is there really no way to override this? I want it to use the IP address I've configured on the ingress which is actually accessible, not the internal cluster IP. I'm having a hard time believing the only way to expose inference endpoints externally without crippling the earlier mentioned problems hasn't already been accounted for, but I can't seem to track down a solution.