flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.76k stars 657 forks source link

[BUG] `flytescheduler` not starting with istio authorization policy #2562

Closed bstadlbauer closed 2 years ago

bstadlbauer commented 2 years ago

Describe the bug

When using istio's Authorization Policiy in conjunction with the flyte-core helm chart, the flytescheduler pod won't start, as the flytescheduler-check init container tries to access flyteadmin before the istio proxy gets injected, thus getting a RBAC: access denied error.

Looking at options, I've found this conversation, as well as https://github.com/istio/istio/issues/11130, from which I gathered that there might be no workaround ATM.

A possible solution they are suggesting would be to move all init-logic into a real (non-init) container, as the istio proxy should be ready by then. Do you have any thoughts on this or another possible workaround? We are currently not blocked by this as we're not using the scheduler right now.

Expected behavior

The flytescheduler pod coming up.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome[bot] commented 2 years ago

Thank you for opening your first issue here! 🛠

pmahindrakar-oss commented 2 years ago

Hi @bstadlbauer , doesn't the k8 backoff policy kick in even in this case and eventually bringup the scheduler container.

kumare3 commented 2 years ago

@pmahindrakar-oss / @bstadlbauer what if we just remove the init container? I think scheduler will Keep trying right

bstadlbauer commented 2 years ago

@pmahindrakar-oss Not a K8s expert, but from what I see ATM, the flytescheduler has not come up yet (and the pod is around for ~50h now).

@kumare3 That would be nice! What is the purpose of flytescheduler precheck? Can it be safely removed? If so I am happy to open up a PR here :-)

pmahindrakar-oss commented 2 years ago

@bstadlbauer @kumare3 we are using precheck as kind of readiness check where it doesn't initialize the main pod until the init-container precheck completes. It validates that flyteadmin is in SERVING status.

This indirectly also validates that DB is up which is used by the scheduler to read the schedules . Additionally send the activated schedules to admin. If we remove this precheck

We can probably add a flag to skip the precheck and you can pass this from your values file. Will that work. I want to avoid removing this check as it might start sending alerts on schedule failures using FailedExecutionCounter metric.

For quick verification you can try removing the init container and check if the scheduler pod eventually comes up

agates4 commented 2 years ago

Init containers are ran in the order they appear in the pod spec https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#detailed-behavior

Hashicorp Vault gets around this by modifying the yaml so that the Vault init container runs before all other init containers: https://github.com/hashicorp/vault-k8s/blob/35b63e4720e29ed701f70a901ecd421f5c5bc26f/agent-inject/agent/agent.go#L571-L589

There could be an annotation within Flyte to determine ordering in the same way https://www.vaultproject.io/docs/platform/k8s/injector/annotations#vault-hashicorp-com-agent-init-first

agates4 commented 2 years ago

Within Istio's mesh config, there is a holdApplicationUntilProxyStarts value that would force the istio sidecar to be injected and initialized first https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/ (command+f holdApplicationUntilProxyStarts)

@bstadlbauer We can test this by modifying the labs service mesh with this value

bstadlbauer commented 2 years ago

@agates4 I've tested setting the istio config, but seems like this won't do anything (I think this might be because it won't affect init containers, similar to what's discussed here).

I've opened https://github.com/flyteorg/flyte/pull/2576, which adds an option to disable the precheck init container