cloudfoundry / cf-for-k8s

The open source deployment manifest for Cloud Foundry on Kubernetes
Apache License 2.0
300 stars 115 forks source link

default `enable_automount_service_account_token: false` breaks my installs #542

Closed paulczar closed 3 years ago

paulczar commented 3 years ago

Describe the bug

with the default setting of enable_automount_service_account_token: false I see failures in uaa, eirini and any workload apps. The issue surfaces as the istio-proxy sidecar can't mount the service account and then cannot communicate back to other services. For example uaa cannot access the database, but other pods in the same namespace can.

Caused by: org.postgresql.util.PSQLException: Connection to cf-db-postgresql.cf-db:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.   which i can confirm by running netcat in the uaa pod nc: connect to cf-db-postgresql.cf-db port 5432 (tcp) failed: Connection refused

The istio sidecar logs ( which I didn't save ) were very clear about not being able to find the service account credentials. Setting the value for true and doing a completly fresh reinstall cleared up the issues I was seeing ( initially I had problems still, but I don't think I'd done a fresh cluster )

I saw this on both GKE and TKG on AWS clusters.

thread from slack:

Paul Czarkowski Oct 16th at 7:36 PM
this is a weird one, UAA refuses to start because it can't connect to the db Caused by: org.postgresql.util.PSQLException: Connection to cf-db-postgresql.cf-db:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.   which i can confirm by running netcat in the uaa pod nc: connect to cf-db-postgresql.cf-db port 5432 (tcp) failed: Connection refused but I can run a plain postgres pod in the cf-system namespace and connect using the uaa creds.

Paul Czarkowski  3 days ago
same from the devel branch

Paul Czarkowski  3 days ago
everything else works, even the ccdb-migrate prompt

Paul Czarkowski  3 days ago
OMG it was something to do with the uaa service account I switch serviceaccount in the deployment spec to default and it started working!

Paul Czarkowski  3 days ago
kube version - v1.18.9-gke.801

Paul Czarkowski  3 days ago
it appears to fail with  when automountServiceAccountToken is set to either true or false, but removing it completely works.

James Pollard  3 days ago
:thinking_spin:
cf-gitbot commented 3 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/175340223

The labels on this github issue will be updated when the story is started.

acosta11 commented 3 years ago

Hi @paulczar,

Thanks for the issue and giving cf-for-k8s a try on TKG on AWS. The short answer for the current state of configuration is that we optimized for security by default and validated those defaults mainly on the GKE clusters on which we've based development. If it's possible for a cluster to deploy without the need for auto mounting service account tokens, we would like it to do so. Ideally, these non-default configuration options are more like problems to solve for noncompliant clusters and components than normal configuration we would like to support.

From what we can tell, the main motivation for enabling the option is to support Istio in environments lacking support for what they call third-party-jwt tokens. (See this istio doc for that distinction) When istio goes from the default third-party-jwt tokens to first-party-jwt tokens, we also need to auto mount those service account tokens (which appears to be an open issue against istio. So on AWS/EKS, the requirement might just be to enable those options as we note in local development docs: enable_automount_service_account_token: true and use_first_party_jwt_tokens: true. As a disclaimer, we haven't had the bandwidth to go and invest a lot of time into support for multiple IaaS providers, so I'm assuming this is the issue on AWS/EKS. Any additional insight into AWS/EKS specific requirements might be worth capturing in docs.

Ultimately we would like to stick with the current defaults in cf-for-k8s. As an ideal upstream solution, we can either try to get other IaaS providers to support those third-party-jwt tokens or try to circumvent the automount requirement of the Istio proxy by trying to get Istio to solve the previously linked issue.

In the case of failures on the GKE side, I would be curious to see how the cluster is being configured because our CI environments on GKE don't appear to have this issue.

Thanks, Andrew

paulczar commented 3 years ago

ahhh I see so its a case of if use_first_party_jwt_tokens is set to true then set enable_automount_service_account_token to true as well? If so that sounds like something we might cover in the ytt configs ?

I have "use_first_party_jwt_tokens" enabled in google as well as a leftover from working on it in AWS as well, I was trying to keep the envs as similar as possible.

ericpromislow commented 3 years ago

Datapoint:

Setting use_first_party_jwt_tokens: true but leaving enable_automount_service_account_token: false

gives this error message in log-cache's cf-auth-proxy container:

2020/10/23 00:08:28.628166 Not ready to start: 
got an invalid status code talking to UAA 
503 Service Unavailable
ericpromislow commented 3 years ago

We've documented that these two issues should be set to the same value for now, and it's in the docs. We aren't going to require them to be the same because there might be some exotic environments that we haven't encountered where they can be different.

Later we'll investigate collapsing them into one field if that's the case universally.

Feel free to open this issue if you have any other concerns.