Explore how to enable Istio sidecar on charm pods

ca-scribner commented 10 months ago

Context

To support improved network isolation of Charmed Kubeflow components, investigate how we can attach an istio sidecar to charm pods.

What needs to get done

-

Definition of Done

a prototype implementation (spec or working example) that demonstrates how to deploy charms with istio sidecars

syncronize-issues-to-jira[bot] commented 10 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5166.

This message was autogenerated

ca-scribner commented 10 months ago

@kimwnasptd suggested doing something similar to observability does here with resource limits. iiuc, in that case each charm that wanted an istio sidecar would modify its own Juju-created StatefulSet to add the istio sidecar annotation, which would trigger a redeploy of the underlying charm(/workload) pod.

Some outstanding questions on this are:

can you change a StatefulSet's pod annotations? I think yes, base on this
changing annotations triggers a rolling restart update that will kill the currently running charm pod (eg: the pod that asked for the annotation change). Is that a problem? It means the charm will be killed at some future time (maybe mid-execution, maybe after full execution). Has o11y had issues with this?
will Juju overwrite this annotation later? What event hooks do we need to subscribe to to protect us from intervention by Juju?
if a charm pod modifies itself to add istio, there is a period where the charm pod does not yet have an istio sidecar. Will this cause issues?

edit: by annotations, I think we mean labels

ca-scribner commented 10 months ago

What about other possible solutions? Maybe:

(added just so its easy to refer to later) using the above suggestion of a charm modifying its own statefulset to add the injection label
can we tell Juju to set annotations at deploy time, maybe via constraints or config?
- no
can we instead use the namespace-level istio injection? Setting the label istio-injection=enabled on a namespace (eg: namespace/kubeflow) means all new pods in that namespace will have an istio sidecar injected
- this would mean ALL pods in the namespace get sidecars. Would this mess with the model controller? The model controller would exist before the label was set, but if it was killed and needed to restart it would now have a sidecar. Maybe we could block this somehow (could add a do-not-inject label on the model operator, if that was configurable - or at least an "allow all" policy on the model operator so it isn't blocked from doing its job)?
Use sidecar manual injection: istioctl has a command that will modify a pod to include the sidecar, similar to how the webhook would. Can we just do this as pebble containers? The actual istio sidecar is probably doable, but istio also injects an init-container afaik which probably cannot be done with juju+pebble?

ca-scribner commented 10 months ago

Tried out option (1) (the "charm adds the istio label to its own statefulset" solution) using this charm:

charm.py

```python #!/usr/bin/env python3 import logging import time from typing import List, Optional import yaml from lightkube import Client from lightkube.resources.apps_v1 import StatefulSet from ops.charm import CharmBase from ops.main import main from ops.model import ActiveStatus logger = logging.getLogger(__name__) class IstioSidecarInjectorLabelTest(CharmBase): """Charm for testing whether a charm can self-inject an istio sidecar by changing its labels.""" def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) logger.info("IstioSidecarInjectorLabelTest.__init__") self.framework.observe(self.on.config_changed, self._echo_before_event) self.framework.observe(self.on.config_changed, self._add_istio_injection_label_to_statefulset) self.framework.observe(self.on.config_changed, self._echo_after_event) def _echo_before_event(self, event): """Echoes before the event is handled.""" logger.info("IstioSidecarInjectorLabelTest._echo_before_event") logger.info(f"Event: {event}") def _echo_after_event(self, event): """Echoes when an event is handled.""" logger.info("IstioSidecarInjectorLabelTest._echo_after_event") logger.info(f"Event: {event}") def _add_istio_injection_label_to_statefulset(self, event): """Adds the istio-injection=true label to the statefulset for this charm.""" logger.info("IstioSidecarInjectorLabelTest._add_istio_injection_label_to_statefulset") sleeptime = 10 logger.info(f"Sleeping {sleeptime} seconds") time.sleep(sleeptime) logger.info("Sleep complete") lightkube_client = Client() patch = { "spec": { "template": { "metadata": {"labels": {"sidecar.istio.io/inject": "true"}} } } } logger.info("patching statefulset") lightkube_client.patch( StatefulSet, name=self.model.app.name, namespace=self.model.name, obj=patch ) logger.info("patched statefulset") sleeptime = 10 logger.info(f"Sleeping {sleeptime} seconds") time.sleep(sleeptime) logger.info("Sleep complete") if __name__ == "__main__": main(IstioSidecarInjectorLabelTest) ```

On config-changed, the charm has 3 handlers:

_echo_before_event: echos to say its handling config-changed
_add_istio_injection_label_to_statefulset: patches its own statefulset's podspec labels and waits a little after this is done
_echo_after_event: echos to say its handling config-changed

Deploying it (juju deploy ./*.charm) to a cluster that has istio installed sees the following:

charm deploys without a sidecar
_echo_before_event fires and completes
_add_istio_injection_label_to_statefulset fires, patches this charm's own statefulset to add the istio label, and waits
after an unpredictable amount of time the pod running the charm code (in the statefulset we just patched) is terminated by kubernetes to roll out a new version of this pod
the new version of the pod is rolled out with the istio label and istio injects a sidecar pod. this pod appears to start up correctly.

So this does appear to work

ca-scribner commented 10 months ago

Is there a race condition with the above solution? Since doing juju deploy bundle.yaml results in charms deploying all at once in no guaranteed order, doing something like juju deploy kubeflow likely results in some charms deploying before the istio mutatingwebhooks are live. So will the above injection test charm add a label to its statefulset, restart the pod so the pod has the istio label, and still not receive an istio sidecar because nothing is watching for the label yet?

A hacky solution could be that the istio-label code above also checks to see if the new pod has an istio sidecar, but that feels ugly. And even if we spot the sidecar is missing, what do we do? At best we can defer and hope some other event triggers us again soon

ca-scribner commented 10 months ago

Yes, after testing this the race condition does exist - if the label-injection occurs before istio has created its webhooks, the charm pods will be restarted but with the istio label but will not receive the istio sidecar

ca-scribner commented 10 months ago

Testing out option (4) (manual sidecar injection):

Taking this StatefulSet modified a little from the istio docs:

sleep.yaml

```yaml apiVersion: v1 kind: ServiceAccount metadata: name: sleep --- apiVersion: v1 kind: Service metadata: name: sleep labels: app: sleep spec: ports: - port: 80 name: http selector: app: sleep --- apiVersion: apps/v1 kind: StatefulSet metadata: name: sleep spec: replicas: 1 selector: matchLabels: app: sleep serviceName: sleep template: metadata: labels: app: sleep spec: terminationGracePeriodSeconds: 0 serviceAccountName: sleep containers: - command: - /bin/sleep - infinity image: curlimages/curl imagePullPolicy: IfNotPresent name: sleep volumeMounts: - mountPath: /etc/sleep/tls name: secret-volume volumes: - name: secret-volume secret: secretName: sleep-secret optional: true ```

and running that through istioctl kube-inject -f sleep.yaml, we get:

sleep_with_istio_injected.yaml

```yaml apiVersion: v1 kind: ServiceAccount metadata: name: sleep --- apiVersion: v1 kind: Service metadata: name: sleep labels: app: sleep spec: ports: - port: 80 name: http selector: app: sleep --- apiVersion: apps/v1 kind: StatefulSet metadata: creationTimestamp: null name: sleep spec: replicas: 1 selector: matchLabels: app: sleep serviceName: sleep template: metadata: annotations: kubectl.kubernetes.io/default-container: sleep kubectl.kubernetes.io/default-logs-container: sleep prometheus.io/path: /stats/prometheus prometheus.io/port: "15020" prometheus.io/scrape: "true" sidecar.istio.io/status: '{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null,"revision":"default"}' creationTimestamp: null labels: app: sleep security.istio.io/tlsMode: istio service.istio.io/canonical-name: sleep service.istio.io/canonical-revision: latest spec: containers: - command: - /bin/sleep - infinity image: curlimages/curl imagePullPolicy: IfNotPresent name: sleep resources: {} volumeMounts: - mountPath: /etc/sleep/tls name: secret-volume - args: - proxy - sidecar - --domain - $(POD_NAMESPACE).svc.cluster.local - --proxyLogLevel=warning - --proxyComponentLogLevel=misc:error - --log_output_level=default:info - --concurrency - "2" env: - name: JWT_POLICY value: third-party-jwt - name: PILOT_CERT_PROVIDER value: istiod - name: CA_ADDR value: istiod.kubeflow.svc:15012 - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: INSTANCE_IP valueFrom: fieldRef: fieldPath: status.podIP - name: SERVICE_ACCOUNT valueFrom: fieldRef: fieldPath: spec.serviceAccountName - name: HOST_IP valueFrom: fieldRef: fieldPath: status.hostIP - name: PROXY_CONFIG value: | {"discoveryAddress":"istiod.kubeflow.svc:15012","tracing":{"zipkin":{"address":"zipkin.kubeflow:9411"}}} - name: ISTIO_META_POD_PORTS value: |- [ ] - name: ISTIO_META_APP_CONTAINERS value: sleep - name: ISTIO_META_CLUSTER_ID value: Kubernetes - name: ISTIO_META_NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: ISTIO_META_INTERCEPTION_MODE value: REDIRECT - name: ISTIO_META_MESH_ID value: cluster.local - name: TRUST_DOMAIN value: cluster.local image: docker.io/istio/proxyv2:1.17.3 name: istio-proxy ports: - containerPort: 15090 name: http-envoy-prom protocol: TCP readinessProbe: failureThreshold: 30 httpGet: path: /healthz/ready port: 15021 initialDelaySeconds: 1 periodSeconds: 2 timeoutSeconds: 3 resources: limits: cpu: "2" memory: 1Gi requests: cpu: 100m memory: 128Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: true runAsGroup: 1337 runAsNonRoot: true runAsUser: 1337 volumeMounts: - mountPath: /var/run/secrets/workload-spiffe-uds name: workload-socket - mountPath: /var/run/secrets/credential-uds name: credential-socket - mountPath: /var/run/secrets/workload-spiffe-credentials name: workload-certs - mountPath: /var/run/secrets/istio name: istiod-ca-cert - mountPath: /var/lib/istio/data name: istio-data - mountPath: /etc/istio/proxy name: istio-envoy - mountPath: /var/run/secrets/tokens name: istio-token - mountPath: /etc/istio/pod name: istio-podinfo initContainers: - args: - istio-iptables - -p - "15001" - -z - "15006" - -u - "1337" - -m - REDIRECT - -i - '*' - -x - "" - -b - '*' - -d - 15090,15021,15020 - --log_output_level=default:info image: docker.io/istio/proxyv2:1.17.3 name: istio-init resources: limits: cpu: "2" memory: 1Gi requests: cpu: 100m memory: 128Mi securityContext: allowPrivilegeEscalation: false capabilities: add: - NET_ADMIN - NET_RAW drop: - ALL privileged: false readOnlyRootFilesystem: false runAsGroup: 0 runAsNonRoot: false runAsUser: 0 serviceAccountName: sleep terminationGracePeriodSeconds: 0 volumes: - name: workload-socket - name: credential-socket - name: workload-certs - emptyDir: medium: Memory name: istio-envoy - emptyDir: {} name: istio-data - downwardAPI: items: - fieldRef: fieldPath: metadata.labels path: labels - fieldRef: fieldPath: metadata.annotations path: annotations name: istio-podinfo - name: istio-token projected: sources: - serviceAccountToken: audience: istio-ca expirationSeconds: 43200 path: istio-token - configMap: name: istio-ca-root-cert name: istiod-ca-cert - name: secret-volume secret: optional: true secretName: sleep-secret updateStrategy: {} status: replicas: 0 ```

where the differences are essentially:

additional annotations and labels, some of which I think matter to istio (but not confirmed)
an additional container (the istio sidecar) with:
- a bunch of environment variables that depend on the configuration of istio in your current cluster (they add things like "discoveryAddress":"istiod.kubeflow.svc:15012", which is dependent on where istio is deployed)
- several volumes, some of which mount configmaps or secrets defined by istiod
(if not using the Istio CNI plugin) an initContainer to set up the iptable rules so networking goes through the istio sidecar

So to manually inject the sidecar as proposed, we'd need to replicate all of the above differences inside the charm. Addressing each separately:

adding the annotations/labels: doable (through the same way as solution (1) injected the istio-injection labels).
adding the sidecar container: easy, although configuring it might be more complicated (how do we get the inputs? do they change often? can we get them from the istio charm?), and the volumes may be an issue as I don't think there's a good way to do this with sidecars
adding the initContainer: not implemented in charms now, and feels impossible (how can a Pod deploy its own initContainer?). If we're using the Istio CNI though, this is not an issue

Overall, this might be doable, but it does sound complicated. And this has a similar race condition as solution (1). If we haven't deployed istio yet, how do we get the configurations for the istio sidecar? Maybe they're deterministic and we can specify them statically, but then do we bake all that configuration into the code? If we do know the configurations, we'd still have to test what happens when a sidecar starts but the istiod it depends on is not available - does it gracefully handle this until istio is alive?

ca-scribner commented 10 months ago

Looking at option (3) (use namespace-level injections by labelling the namespace with istio-injection=enabled)...

Implementation of this option is much easier for most charms. By annotating the kubeflow namespace for istio injection, we no longer need to change any of the charms themselves.

We do, however, face a similar race condition to option (1). If we do:

juju add-model kubeflow
kubectl label namespace kubeflow istio-injection=enabled --overwrite
juju deploy kubeflow

the kubeflow namespace will have the correct istio labels, but until the istio-pilot charm has deployed istio's webhooks the pods created will not get istio sidecars. So if you have the charms deploy in this order, you might see:

charmA <-- does not get a sidecar
charmB <-- does not get a sidecar
istio-pilot <-- implements the sidecar mutating webhook
charmC <-- gets a sidecar

So we still have the issue where we need istio deployed before everything else.

Another issue with this solution is that we could accidentally give juju's model operator an istio sidecar. iiuc this by itself shouldn't hurt the model operator alone, but if we also had a deny-all default policy in the Kubeflow namespace then we might accidentally block the traffic. We'd need to address this too

ca-scribner commented 10 months ago

Given the race conditions affecting (1) and (3), I wonder if it would make sense to unbundle istio from the rest of Kubeflow. This is a large change, but would have benefits.

Maybe we have:

istiod deployed somewhere in the cluster
- could be managed by an istio charm in a separate model, or a user's own istio install that is not managed by charms
istio-integrator charm:
- configured to use the existing istio (config options to set where istio is, etc)
- provides things like our ingress relations that we currently get from istio pilot

Then the kubeflow bundle would include istio-integrator rather than istio itself, and deployment would be like:

juju add-model istio-system
juju deploy istio # (the daemon)
# wait

juju add-model kubeflow
juju deploy kubeflow # (includes istio-integrator)

ca-scribner commented 10 months ago

Notes from an internal meeting here

Summary is that we have some design decisions left to make - will report back here when we decide them

ca-scribner commented 10 months ago

This is also affected by whether we are using the istio CNI plugin or not. For example, looking at the output for istioctl kube-inject ..., it includes the istio-init container that sets up the networking. So manual injection must be different depending on whether you're using the istio CNI plugin

canonical / istio-operators