apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.76k stars 4.21k forks source link

[Failing Test]: Many tests failing with invalid JWT signature #31848

Open kennknowles opened 1 month ago

kennknowles commented 1 month ago

What happened?

ERROR: (gcloud.container.images.list) There was a problem refreshing your current auth tokens: ('invalid_grant: Invalid JWT Signature.', ***'error': 'invalid_grant', 'error_description': 'Invalid JWT Signature.'***)
Please run:

> Task :beam-test-tools:removeStaleSDKContainerImages FAILED
  $ gcloud auth login

to obtain new credentials.

If you have already logged in with a different account, run:

  $ gcloud config set account ACCOUNT

to select an already authenticated account to use.

FAILURE: Build failed with an exception.

Issue Failure

Failure: Test is continually failing

Issue Priority

Priority: 1 (unhealthy code / failing or flaky postcommit so we cannot be sure the product is healthy)

Issue Components

kennknowles commented 1 month ago

I think you have a fix up, yes? So I am assigning

Abacn commented 1 month ago

Yes I was trying to fix it - https://github.com/apache/beam/actions/workflows/beam_CleanUpGCPResources.yml?query=branch%3Atryfixpythontest

but not successful as some other affected workflows. Current plan to fix build-wheel first (which is more urgent one as it is used to build rc) then come back to these two

Abacn commented 1 month ago

The "Invalid JWT" error is affecting many performance tests that uses GitHub secret to set secret values also.

Abacn commented 1 month ago

CleanUpGCPResource currently permared due to two issues

For the second one, this happens since July 9th, and it is probablistic:

image

This suggests some roll out process is in play. Either GCP or GitHub is rolling out something that breaking the auth if conducted by "google-github-actions/auth@v1"

Abacn commented 1 month ago

Current status:

java_tests and python_tests failing, other tests fixed after switching to self-hosted runner's default cred

Abacn commented 1 month ago

downgrade to P2 as this has been mitigated. Due to that SA key expiration now enforced in the testing project, even if we generate new key and request update it, the key expires in 90 days and will break again.