2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
104 stars 64 forks source link

Warnings about Service Account Key for gcloud in CI #903

Closed sgibson91 closed 2 years ago

sgibson91 commented 2 years ago

Description

CI/CD is producing the following warning when we login to gcloud:

"service_account_key" has been deprecated. Please switch to using google-github-actions/auth which supports both Workload Identity Federation and Service Account Key JSON authentication. For more details, see google-github-actions/setup-gcloud#authorization

I think this is related to https://cloud.google.com/blog/products/identity-security/enabling-keyless-authentication-from-github-actions

Value / benefit

Implementation details

No response

Tasks to complete

Updates

No response

sgibson91 commented 2 years ago

Merging #1059 caused our GCP deployments to fail in CI with the following message

Activated service account credentials for: [pilot-hubs-cd-sa@two-eye-two-see.iam.gserviceaccount.com]
Fetching cluster endpoint and auth data.
ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Required "container.clusters.get" permission(s) for "projects/two-eye-two-see/zones/us-central1-b/clusters/pilot-hubs-cluster".
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/runner/work/infrastructure/infrastructure/deployer/__main__.py", line 8, in <module>
    cli.main()
  File "/home/runner/work/infrastructure/infrastructure/deployer/cli.py", line 111, in main
    deploy_support(args.cluster_name)
  File "/home/runner/work/infrastructure/infrastructure/deployer/deploy_actions.py", line 66, in deploy_support
    with cluster.auth():
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/runner/work/infrastructure/infrastructure/deployer/cluster.py", line 28, in auth
    yield from self.auth_gcp()
  File "/home/runner/work/infrastructure/infrastructure/deployer/cluster.py", line 316, in auth_gcp
    subprocess.check_call(
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess

Seems like the change caused something to go wrong with us using a different service account key to gain access to the cluster in CI.

I tried PR #1060 to attempt to fix this, but it didn't work - so now I'm not really sure what's going on.

I reverted PRs #1059 and #1060 so we are at least back to working CI.

sgibson91 commented 2 years ago

Second attempt made in #1063 implementing @consideRatio's suggestion here: https://github.com/2i2c-org/infrastructure/pull/1059#discussion_r821644968

sgibson91 commented 2 years ago

1063 also did not fix the issue

consideRatio commented 2 years ago

@sgibson91 hmmm... The issue as I understand it, is that of the following two gcloud calls, only the first succeeds after our changes.

https://github.com/2i2c-org/infrastructure/blob/bf6b319de625e6eedfb745ecc3807d32e53d1575/deployer/cluster.py#L306-L327

The logs related to these two commands are:

# first command
Activated service account credentials for: [pilot-hubs-cd-sa@two-eye-two-see.iam.gserviceaccount.com]

# second command
Fetching cluster endpoint and auth data.
ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Required "container.clusters.get" permission(s) for "projects/two-eye-two-see/zones/us-central1-b/clusters/pilot-hubs-cluster".

What can we conclude?

Why? Hmmm... Not sure.

I'm assuming it is trying to use our KMS GCP SA for some reason though. Perhaps because we set an environment variable via the /auth action that wasn't done when we used the /setup-gcloud action?

sgibson91 commented 2 years ago

That's a nice summary, thank you @consideRatio

Just a note that #1059 did not remove export_default_credentials: true from /setup-gcloud and still failed.

yuvipanda commented 2 years ago

@sgibson91 @consideRatio my suspicion is that this is failing due to https://github.com/2i2c-org/infrastructure/issues/1034#issuecomment-1061259455, and the credentials we have are from the older workspace, not the newer generated ones.

sgibson91 commented 2 years ago

@yuvipanda So then why does CI/CD work when I revert the PRs? Or at all? Surely if we have out of date credentials files in the repo, they'd just stop working all together? I don't quite understand.

yuvipanda commented 2 years ago

@sgibson91 yeah, I just tested that theory when I manually deployed https://github.com/2i2c-org/infrastructure/pull/1058 and it worked. I was just confidently wrong, sorry :(

sgibson91 commented 2 years ago

Haha, no worries!

GeorgianaElena commented 2 years ago

I tried to do a deploy to staging from a branch that's based off https://github.com/2i2c-org/infrastructure/commit/31a1d8c2dedb3d58bf61b482112f2fb8856669d6 and I'm getting this error:

ERROR: (gcloud.auth.activate-service-account) There was a problem refreshing your current auth tokens: ('invalid_grant: Invalid JWT Signature.', '{"error":"invalid_grant","error_description":"Invalid JWT Signature."}')

gcloud auth list shows up that I'm using the CI's account's credentials:

pilot-hubs-cd-sa@two-eye-two-see.iam.gserviceaccount.com

So maybe the issue is just that the credentials just expired? It might be a stupid assumption since I didn't follow very thoroughly what had happened and I might be missing important info, but I believe it's was worth mentioning.

sgibson91 commented 2 years ago

@GeorgianaElena The credentials for the 2i2c cluster got accidentally regenerated and new credentials were added to the repo in https://github.com/2i2c-org/infrastructure/pull/1071 You will need to rebase your branch against master

GeorgianaElena commented 2 years ago

Thanks @sgibson91 and sorry for the false alarm!

consideRatio commented 2 years ago

Fixed by #1209 and #1210, the crux was that the new auth action added an env that was overriding whatever service account we configured us to use when we run gcloud auth activate-server-account in the deployer script, so #1210 resolved that after bumping to use non-deprecated actions.