Closed DerekTBrown closed 2 weeks ago
Can't we simply avoid this by not using the Service Account that is enabled for Pod Identity webhook?
Can't we simply avoid this by not using the Service Account that is enabled for Pod Identity webhook?
I am not sure I understand your suggestion. How do Service Accounts fit into the mutating webhook flow?
What exactly prevents CNI from coming up? This webhook sets failurePolicy: Ignore
https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/deploy/mutatingwebhook.yaml#L10
So I'd expect that it will not prevent other pods form starting if it is itself down.
What exactly prevents CNI from coming up? This webhook sets failurePolicy: Ignore
At least in our case, we rely on kops
to install the webhook, which sets this to Fail
:
Isn't this a
kops
issue then?
IMO, eks-pod-identity-webhook
should expect downstream consumers to change the webhook to use a Fail
policy, and support features to enable this usecase.
Having a webhook with the failurePolicy: Ignore
shifts the failure mode from an infra-level failure (where infrastructure/platform teams can detect and remediate the actual issue) to a service-level failure (where now individual services are responsible for detecting that they don't have adequate information, and then escalating this failure to the infrastructure/platform team actually responsible for the webhook). This is an undesirable behavior for many (if not most) users.
What exactly prevents CNI from coming up? This webhook sets
failurePolicy: Ignore
One other thing I realized I should add: I could see a case as well where a customer wants to disable the Pod Identity Webhook as a performance optimization, independent of the failure mode. For example, if leveraging DoEKS, they may not need IAM roles on Spark worker pods.
Looks like this issue was resolved in the proposed PR linked above (https://github.com/aws/amazon-eks-pod-identity-webhook/pull/216)
I appreciate the feedback on failurePolicy: Ignore
- looks like it was changed from the default value of Fail
to Ignore
here: https://github.com/aws/amazon-eks-pod-identity-webhook/pull/3/files
We can look into the tradeoffs of changing it back, but it's a separate issue.
Marking this issue as closed.
What would you like to be added?
It would be useful to have a standard label that excludes pods from being processed by the identity webhook.
Proposed Change: https://github.com/aws/amazon-eks-pod-identity-webhook/pull/216
Why is this needed?
Removing Cyclical Dependencies - When installing a CNI and the Pod Identity Webhook onto our clusters, we identified a cyclical dependency; the identity webhook needs the network/CNI to be successfully operating to be able to communicate with the backend service, and the CNI needs the identity webhook to be functioning in order to schedule pods. The CNI doesn't rely on Amazon identities, so having a mechanism to skip the identity webhook for some pods (including the CNI) would remove this cyclical dependency.
Performance/Reliability - Having the ability to exclude pods that don't need Amazon identities can reduce load on the webhook and improve pod launching performance. This also removes a critical dependency for services that need to be resilient to failure.
Standardization is key to compatibility- Having a canonical tag for skipping the identity webhook will make it easier for downstream projects to exclude critical components (such as CNIs) from the webhook, leading to a better out-of-the-box experience.