dotnet / docker-tools

This is a repo to house some common tools for our various docker repos.
MIT License
122 stars 46 forks source link

Token expiration error when getting creds after push #1350

Closed mthalman closed 3 months ago

mthalman commented 3 months ago

Getting the following error when attempting to query for the manifest digest after pushing an image:

ResponseBody: {"error":"invalid_client","error_description":"AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-06-20T21:51:37.0880125Z, assertion valid from 2024-06-20T21:26:46.0000000Z, expiry time of assertion 2024-06-20T21:36:46.0000000Z. Review the documentation at https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials . Trace ID: 3e9b8685-f562-4c31-b2a4-0adf04391700 Correlation ID: 8ab2fea3-1a72-4a13-b818-5ca5d0995af7 Timestamp: 2024-06-20 21:51:37Z","error_codes":[700024],"timestamp":"2024-06-20 21:51:37Z","trace_id":"3e9b8685-f562-4c31-b2a4-0adf04391700","correlation_id":"8ab2fea3-1a72-4a13-b818-5ca5d0995af7","error_uri":"[https://login.microsoftonline.com/error?code=700024"}](https://login.microsoftonline.com/error?code=700024%22})

Example build (internal link)

This seems to happen for builds that take longer than 20 minutes roughly. This is basically the same thing that was happening before but that was 60 minutes and fixed by https://github.com/dotnet/docker-tools/pull/1321.

The thing that seems to be unique to this scenario is that it's only occurring for build jobs that do not pull base images from the mirrored repo. These are jobs that build images which are only based on images from mcr.microsoft.com. In that case, we never mirror those images. Since we never authenticated to pull those images, we never primed the cache at the beginning of the job. So at the end when it attempted to retrieve the token, it was expired.

Interestingly, this error does not occur when using an old version of Image Builder, prior to https://github.com/dotnet/docker-tools/pull/1310.

mthalman commented 3 months ago

The thing that seems to be unique to this scenario is that it's only occurring for build jobs that do not pull base images from the mirrored repo. These are jobs that build images which are only based on images from mcr.microsoft.com. In that case, we never mirror those images. Since we never authenticated to pull those images, we never primed the cache at the beginning of the job. So at the end when it attempted to retrieve the token, it was expired.

The error isn't limited to this scenario. It was just a coincidence that the ones that were failing (taking a long time to run the job) also happened to be only using base images from MCR. But I've confirmed that even those which do pull a base image from the mirror location also fail with the same error when they run for a long enough time.

mthalman commented 3 months ago

Now I'm really confused. This build (internal link) from today took 1 hr 30 mins and ran successfully.