Closed sgibson91 closed 2 years ago
Merging #1059 caused our GCP deployments to fail in CI with the following message
Activated service account credentials for: [pilot-hubs-cd-sa@two-eye-two-see.iam.gserviceaccount.com]
Fetching cluster endpoint and auth data.
ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Required "container.clusters.get" permission(s) for "projects/two-eye-two-see/zones/us-central1-b/clusters/pilot-hubs-cluster".
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/runner/work/infrastructure/infrastructure/deployer/__main__.py", line 8, in <module>
cli.main()
File "/home/runner/work/infrastructure/infrastructure/deployer/cli.py", line 111, in main
deploy_support(args.cluster_name)
File "/home/runner/work/infrastructure/infrastructure/deployer/deploy_actions.py", line 66, in deploy_support
with cluster.auth():
File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/home/runner/work/infrastructure/infrastructure/deployer/cluster.py", line 28, in auth
yield from self.auth_gcp()
File "/home/runner/work/infrastructure/infrastructure/deployer/cluster.py", line 316, in auth_gcp
subprocess.check_call(
File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess
Seems like the change caused something to go wrong with us using a different service account key to gain access to the cluster in CI.
I tried PR #1060 to attempt to fix this, but it didn't work - so now I'm not really sure what's going on.
I reverted PRs #1059 and #1060 so we are at least back to working CI.
Second attempt made in #1063 implementing @consideRatio's suggestion here: https://github.com/2i2c-org/infrastructure/pull/1059#discussion_r821644968
@sgibson91 hmmm... The issue as I understand it, is that of the following two gcloud
calls, only the first succeeds after our changes.
The logs related to these two commands are:
# first command
Activated service account credentials for: [pilot-hubs-cd-sa@two-eye-two-see.iam.gserviceaccount.com]
# second command
Fetching cluster endpoint and auth data.
ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Required "container.clusters.get" permission(s) for "projects/two-eye-two-see/zones/us-central1-b/clusters/pilot-hubs-cluster".
What can we conclude?
sops
decryption against GCP's KMSgcloud auth activate-service-account
just recently with the GCP SA meant to deploy to k8s.Why? Hmmm... Not sure.
I'm assuming it is trying to use our KMS GCP SA for some reason though. Perhaps because we set an environment variable via the /auth
action that wasn't done when we used the /setup-gcloud
action?
That's a nice summary, thank you @consideRatio
Just a note that #1059 did not remove export_default_credentials: true
from /setup-gcloud
and still failed.
@sgibson91 @consideRatio my suspicion is that this is failing due to https://github.com/2i2c-org/infrastructure/issues/1034#issuecomment-1061259455, and the credentials we have are from the older workspace, not the newer generated ones.
@yuvipanda So then why does CI/CD work when I revert the PRs? Or at all? Surely if we have out of date credentials files in the repo, they'd just stop working all together? I don't quite understand.
@sgibson91 yeah, I just tested that theory when I manually deployed https://github.com/2i2c-org/infrastructure/pull/1058 and it worked. I was just confidently wrong, sorry :(
Haha, no worries!
I tried to do a deploy to staging from a branch that's based off https://github.com/2i2c-org/infrastructure/commit/31a1d8c2dedb3d58bf61b482112f2fb8856669d6 and I'm getting this error:
ERROR: (gcloud.auth.activate-service-account) There was a problem refreshing your current auth tokens: ('invalid_grant: Invalid JWT Signature.', '{"error":"invalid_grant","error_description":"Invalid JWT Signature."}')
gcloud auth list
shows up that I'm using the CI's account's credentials:
pilot-hubs-cd-sa@two-eye-two-see.iam.gserviceaccount.com
So maybe the issue is just that the credentials just expired? It might be a stupid assumption since I didn't follow very thoroughly what had happened and I might be missing important info, but I believe it's was worth mentioning.
@GeorgianaElena The credentials for the 2i2c cluster got accidentally regenerated and new credentials were added to the repo in https://github.com/2i2c-org/infrastructure/pull/1071 You will need to rebase your branch against master
Thanks @sgibson91 and sorry for the false alarm!
Fixed by #1209 and #1210, the crux was that the new auth action added an env that was overriding whatever service account we configured us to use when we run gcloud auth activate-server-account
in the deployer script, so #1210 resolved that after bumping to use non-deprecated actions.
Description
CI/CD is producing the following warning when we login to gcloud:
I think this is related to https://cloud.google.com/blog/products/identity-security/enabling-keyless-authentication-from-github-actions
Value / benefit
Implementation details
No response
Tasks to complete
Updates
No response