Azure / AML-Kubernetes

AzureML customer managed k8s compute samples
MIT License
80 stars 33 forks source link

sklearn-mnist inference endpoint test timeout and failed #267

Open Devinwong opened 1 year ago

Devinwong commented 1 year ago

Hi, I've been following this https://github.com/Azure/AML-Kubernetes/blob/master/docs/simple-flow.md to setup and deploy a sample sklearn-mnist inference endpoint.

I've tried on 3 different configurations.

  1. ✅AKS -> works as expected
  2. ✅AKS connected to Arc -> works as expected (attached the compute target as Arc cluster, not AKS cluster, in order to test Arc route)
  3. ❌minikube local cluster on WSL Ubuntu-20.04 (with Win11, MS working machine, connected to MS VPN) connected to Arc -> deployment succeeded. But when trying to test the endpoint with AML Studio UI, it hit timeout and eventually failed.

Here are more details for case 3. image

Collected the following kubectl logs. kubectl_logs.txt

  1. kubectl get pods -A => all the pods are running
  2. kubectl logs blue-sklearn-mnist-minikube-58f5fd4cf7-lgvmf => the pod for inference runtime is running without errors.
  3. az ml online-endpoint invoke - n sklearn-mnist-minikube -r sample-request.json => using the CLI to test the endpoint also timeout, like the AML studio UI test timeout.

ps. Created a local endpoint and tested with command image It was classified as 8 correctly.

What could be wrong with case 3?

zetiaatgithub commented 1 year ago

Sorry for the inconvenient. From the logs you shared, the scoring endpoint is "http://10.106.30.249/api/v1/endpoint/sklearn-mnist-minikube/score" and the error is "Connection to 10.106.30.249 timed out". So, it's a network connection issue. Could you please check if "10.106.30.249" is accessible from your local environment?