Closed bstadlbauer closed 2 years ago
Thank you for opening your first issue here! ðŸ›
Hi @bstadlbauer , doesn't the k8 backoff policy kick in even in this case and eventually bringup the scheduler container.
@pmahindrakar-oss / @bstadlbauer what if we just remove the init container? I think scheduler will Keep trying right
@pmahindrakar-oss Not a K8s expert, but from what I see ATM, the flytescheduler
has not come up yet (and the pod is around for ~50h now).
@kumare3 That would be nice! What is the purpose of flytescheduler precheck
? Can it be safely removed? If so I am happy to open up a PR here :-)
@bstadlbauer @kumare3 we are using precheck as kind of readiness check where it doesn't initialize the main pod until the init-container precheck completes. It validates that flyteadmin is in SERVING status.
This indirectly also validates that DB is up which is used by the scheduler to read the schedules . Additionally send the activated schedules to admin. If we remove this precheck
We can probably add a flag to skip the precheck and you can pass this from your values file. Will that work. I want to avoid removing this check as it might start sending alerts on schedule failures using FailedExecutionCounter metric.
For quick verification you can try removing the init container and check if the scheduler pod eventually comes up
Init containers are ran in the order they appear in the pod spec https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#detailed-behavior
Hashicorp Vault gets around this by modifying the yaml so that the Vault init container runs before all other init containers: https://github.com/hashicorp/vault-k8s/blob/35b63e4720e29ed701f70a901ecd421f5c5bc26f/agent-inject/agent/agent.go#L571-L589
There could be an annotation within Flyte to determine ordering in the same way https://www.vaultproject.io/docs/platform/k8s/injector/annotations#vault-hashicorp-com-agent-init-first
Within Istio's mesh config, there is a holdApplicationUntilProxyStarts
value that would force the istio sidecar to be injected and initialized first
https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/ (command+f holdApplicationUntilProxyStarts
)
@bstadlbauer We can test this by modifying the labs service mesh with this value
@agates4 I've tested setting the istio config, but seems like this won't do anything (I think this might be because it won't affect init containers, similar to what's discussed here).
I've opened https://github.com/flyteorg/flyte/pull/2576, which adds an option to disable the precheck init container
Describe the bug
When using istio's Authorization Policiy in conjunction with the
flyte-core
helm chart, theflytescheduler
pod won't start, as theflytescheduler-check
init container tries to accessflyteadmin
before the istio proxy gets injected, thus getting aRBAC: access denied
error.Looking at options, I've found this conversation, as well as https://github.com/istio/istio/issues/11130, from which I gathered that there might be no workaround ATM.
A possible solution they are suggesting would be to move all init-logic into a
real
(non-init) container, as theistio
proxy should be ready by then. Do you have any thoughts on this or another possible workaround? We are currently not blocked by this as we're not using the scheduler right now.Expected behavior
The
flytescheduler
pod coming up.Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?