canonical / istio-operators

Charmed Istio
2 stars 17 forks source link

Test `ISVC` in a service mesh with Istio CNI plugin installed #354

Closed DnPlas closed 10 months ago

DnPlas commented 10 months ago

What needs to get done

Because of the Istio CNI plugin limitations, the Kserve InferenceServices (ISVC) may be affected by the network configuration, as each ISVC has a storage-initializer init-container, which executes this code, which may require network connectivity.

NOTE: this scenario is possible for other workloads, not just ISVCs, so the actual solution should be generic enough to cover all.

This task requires us to create a Kserve InferenceService inside an Istio mesh with the Istio CNI plugin enabled. Since it will most likely produce an error, we need to provide a solution and document it. An option for fixing this could be to add annotations as described here to all workload Pods, which will require us to change several components controllers/mutatingwebhookconfigs.

Relevant logs

# Not working init-container on a namespace with sidecar injection
ubuntu@charm-dev-jammy:~$ kubectl logs -nkubeflow-user-example-com sklearn-iris-predictor-00001-deployment-6f554fbd99-v6zzh -c storage-initializer
INFO:root:Initializing, args: src_uri [gs://kfserving-examples/models/sklearn/1.0/model] dest_path[ [/mnt/models]
INFO:root:Copying contents of gs://kfserving-examples/models/sklearn/1.0/model to local
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: [Errno 111] Connection refused
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: [Errno 111] Connection refused
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: [Errno 111] Connection refused
WARNING:google.auth._default:Authentication failed using Compute Engine authentication due to unavailable metadata server.

# Working init-container on a namespace w/o istio sidecar injection
ubuntu@charm-dev-jammy:~$ kubectl logs -nkserve-test sklearn-iris-predictor-00001-deployment-9fc88cc4f-bhv89 -c storage-initializer
INFO:root:Initializing, args: src_uri [gs://kfserving-examples/models/sklearn/1.0/model] dest_path[ [/mnt/models]
INFO:root:Copying contents of gs://kfserving-examples/models/sklearn/1.0/model to local
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: [Errno 113] No route to host
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: timed out
WARNING:google.auth._default:Authentication failed using Compute Engine authentication due to unavailable metadata server.
INFO:root:Downloading: /mnt/models/model.joblib
INFO:root:Successfully copied gs://kfserving-examples/models/sklearn/1.0/model to /mnt/models

DOD:

Why it needs to get done

To avoid potential issues when we enable the Istio CNI plugin.

syncronize-issues-to-jira[bot] commented 10 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5050.

This message was autogenerated

DnPlas commented 10 months ago

This issue can be workaround by adding the right annotations to the Pod that eventually gets created via the InferenceService definition. Something like this should help:

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris-workaround"
  annotations:
    traffic.sidecar.istio.io/excludeOutboundIPRanges: "0.0.0.0/0"
spec:
...

This workaround is provided in the official documentation and has also been tested in upstream https://github.com/kubeflow/manifests/issues/2014#issuecomment-1036168983.

Please note this is a workaround and a real solution should be something that applies to any workload, this effort will be tracked in #356

kimwnasptd commented 10 months ago

Closing this issue, since we tested the ISVC. As mentioned above we can then proceed with evaluating Charming kyverno in the future.