Azure / azure-cli

Azure Command-Line Interface
MIT License
3.91k stars 2.87k forks source link

Azure CLI task fails with `AADSTS700024` after 60 minutes #28708

Open jiasli opened 2 months ago

jiasli commented 2 months ago

Acquiring access token with expired OIDC token fails with:

ERROR: AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-04-05T23:01:54.2089203Z, assertion valid from 2024-04-05T22:40:41.0000000Z, expiry time of assertion 2024-04-05T22:50:41.0000000Z. Review the documentation at https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials

As the error indicates, the OIDC token is only valid for 10 minutes. After it is passed to az login via --federated-token, Azure CLI cannot get a new OIDC token after the OIDC token expires.

This is the designed v1 behavior of OIDC token support (#19853).

However, as Azure DevOps task AzureCLI@2 (https://github.com/microsoft/azure-pipelines-tasks/pull/17633) and GitHub Action azure/login@v2 (https://github.com/Azure/login/pull/147) have supported OIDC token authentication, and it is recommended to use workload identity federation, this limitation is becoming more prevailing.

Possible solutions

  1. OIDC token provider such as Azure DevOps or GitHub should provide an option to control the expiry time of the OIDC token to make it at least as long as the task duration.
  2. Design and implement a v2 solution that uses a managed-identity-like interface which allows MSAL/Azure CLI to refresh OIDC token.

References

yonzhan commented 2 months ago

refresh OIDC token is a feature

jiasli commented 2 months ago

Callback interface proposals

Different external identity providers (IdP) have different ways of retrieving the ID token:

I had a discussion with MSAL team today and proposed 2 possible callback interfaces:

  1. Let each external IdP expose a callback command such as getidtoken that returns an ID token in stdout, then instead of providing --federated-token <ID token> to az login, they should provide --federated-token-callback getidtoken to az login, so that CLI and MSAL can actively retrieve an ID token with getidtoken when ID token expires. This is very similar to how Azure Identity's AzureCliCredential retrieves access tokens from Azure CLI by subprocessing az account get-access-token.
  2. Like the GitHub Action solution, define a manage-identity-like URL that can be used to get an ID token, such as ID_TOKEN_REQUEST_URL.
jiasli commented 2 months ago

Mitigation: Extend task duration to 60 minutes

[!WARNING] This mitigation doesn't work with Azure CLI 2.59.0. See https://github.com/Azure/azure-cli/issues/28708#issuecomment-2049400226.

ID token:       |----| 10 min
Access token 1: |------------------------| 60 min
Access token 2:          | 20 min: ERROR: ID token expired

An ID token lasts for 5 minutes on GitHub Actions and 10 minutes on Azure DevOps, but an access token lasts for 60 minutes.

When you run az login, Azure CLI only acquires access tokens for ARM, using https://management.core.windows.net//.default as the scope.

After the ID token expires, if acquiring an access token for other scopes, such as

az account get-access-token --scope https://kusto.kusto.windows.net//.default

as currently there is no access token for that scope in the token cache, Azure CLI/MSAL will try to get an access token with the ID token. However, as the ID token has expired, the command fails with AADSTS700024.

So, the mitigation is pretty straightforward: Acquire all access tokens before the ID token expires.

You have to know which scopes are used in your pipeline task and call az account get-access-token --scope ... immediately after az login. This makes Azure CLI/MSAL acquire access tokens for the specified scopes while the ID token is still valid and save them in the token cache.

For example:

[!WARNING] Even though GitHub Actions can mask the access token as *** in az account get-access-token's output:

+ az account get-access-token
***
  "accessToken": "***",
  "expiresOn": "2024-04-10 14:11:25.000000",
  "expires_on": 1712758285,
  "subscription": "...",
  "tenant": "...",
  "tokenType": "Bearer"
***

You MUST specify --output none to make sure no access token is printed to any of your logs.

Then subsequence commands using these scopes will use the access tokens saved in the token cache, so that they won't fail after the ID token expires, but they will still fail after the access token expires (60 minutes).

Kapsztajn commented 2 months ago

I tried fixing the issue with provided mitigation but it is still persistent, maybe I'm doing something wrong? My workflow contains actions which use NodeJS tests in which I verify connections to ServiceBus. As OIDC is used I login to azure with azure/login@v2 action:

    - name: Azure login
      uses: azure/login@v2
      with:
        client-id: ${{ env.AZURE_CLIENT_ID }}
        tenant-id: ${{ env.AZURE_TENANT_ID }}
        subscription-id: ${{ env.AZURE_SUBSCRIPTION_ID }}
        enable-AzPSSession: false

After that I added step to mitigate the issue:

    - name: Azure get token
      uses: azure/cli@v2
      with:
        inlineScript: |
          az account get-access-token --scope https://storage.azure.com/.default --output none
          az account get-access-token --scope https://servicebus.azure.net/.default 

But after ~10 minutes Im still getting:

    AggregateAuthenticationError: ChainedTokenCredential authentication failed.
    CredentialUnavailableError: Please run 'az login' from a command prompt to authenticate before using this credential.
    CredentialUnavailableError: WorkloadIdentityCredential: is unavailable. tenantId, clientId, and federatedTokenFilePath are required parameters. 
          In DefaultAzureCredential and ManagedIdentityCredential, these can be provided as environment variables - 
          "AZURE_TENANT_ID",
          "AZURE_CLIENT_ID",
          "AZURE_FEDERATED_TOKEN_FILE". See the troubleshooting guide for more information: https://aka.ms/azsdk/js/identity/workloadidentitycredential/troubleshoot

Did I miss something? I use https://www.npmjs.com/package/@azure/service-bus

mderriey commented 2 months ago

Thanks for the mitigation @jiasli.

However, I don't think I'm hitting the issue where the Azure CLI tries to acquire an access token for a difference audience after the ID token has expired.

I'm fairly confident that the az commands I use only use the access token for ARM:

The general flow is:

The time it takes to swap slots varies greatly, however more than 5 minutes have always elapsed by the time it's done.

Now, what is strange is that stopping the slot sometimes work, and sometimes doesn't, dependending on how much time has passed since we ran azure/login.

To me, it sounds like the access token expires "quicker" than before. Could that be?

Edit: I checked across many workflow runs, and to me it looks like the access token expires after 10 minutes.

jiasli commented 2 months ago

@Kapsztajn, I can successfully get an access token for https://servicebus.azure.net/.default locally which lasts for 4600s.

> az account get-access-token --scope https://servicebus.azure.net/.default
{
  "accessToken": "...",
  "expiresOn": "2024-04-11 13:57:35.000000",
  "expires_on": 1712815055,
  "subscription": "0b1f6471-1bf0-4dda-aec3-cb9272f09590",
  "tenant": "54826b22-38d6-4fb2-bad9-b7b93a3e9c5a",
  "tokenType": "Bearer"
}

Decoded claims:

  "iat": 1712810455,
  "nbf": 1712810455,
  "exp": 1712815055,

I am not entirely sure why this line is printed:

CredentialUnavailableError: Please run 'az login' from a command prompt to authenticate before using this credential.

The Azure Service Bus client library for JavaScript SDK also didn't fail with AADSTS700024. I am not an expert of that SDK. Is it possible to collect more details on which scope the SDK requests, and why it fails with that error?

jiasli commented 2 months ago

@mderriey, this seems odd as all these operations are indeed ARM operations. Could you check the actual expiration time of the access token issued for ARM?

> az account get-access-token --scope https://management.core.windows.net//.default --query expiresOn --output tsv
2024-04-11 13:47:47.000000
iamrk04 commented 2 months ago

Hi @Kapsztajn, the suggested mitigation did not work for me as well. It was able to fetch the token with an expiry that was reasonable, but I was able to see the same error once the OID token expired after 5 mins.

I propose a workaround by fetching the OID token every 4 mins to avoid the expiry. I was able to get this working and here is what I did: I inserted the following step in my workflow just before the step where this token expiry issue was popping:

      - name: Fetch OID token every 4 mins
        run: |
          while true; do
            token_request=$ACTIONS_ID_TOKEN_REQUEST_TOKEN
            token_uri=$ACTIONS_ID_TOKEN_REQUEST_URL
            token=$(curl -H "Authorization: bearer $token_request" "${token_uri}&audience=api://AzureADTokenExchange" | jq .value -r)
            az login --service-principal -u ${{ secrets.CLIENT_ID }} -t ${{ secrets.TENANT_ID }} --federated-token $token --output none
            # Sleep for 4 minutes
            sleep 240
          done &

Could you try this out and see if this works for you as well?

mderriey commented 2 months ago

@mderriey, this seems odd as all these operations are indeed ARM operations. Could you check the actual expiration time of the access token issued for ARM?

> az account get-access-token --scope https://management.core.windows.net//.default --query expiresOn --output tsv
2024-04-11 13:47:47.000000

Good suggestion @jiasli , thanks.

Here's what I ran:

steps:
- name: Login to Azure
  uses: azure/login@v2
  with:
    client-id: ${{ env.oidcAppRegistrationClientId }}
    tenant-id: ${{ env.azureTenantId }}
    allow-no-subscriptions: true
    enable-AzPSSession: true

- name: Check token expiry
  shell: bash
  run: |
    echo "Current date: $(date '+%Y-%m-%dT%H:%M:%S')"
    echo "Token expiration: $(az account get-access-token --resource-type arm --query expiresOn --output tsv --debug)"
    echo "Token AzureAD/microsoft-authentication-library-for-python#2 expiration: $(az account get-access-token --resource-type arm --query expiresOn --output tsv --debug)"

And the output (debug output omitted):

Current date: 2024-04-11T06:57:14
Token expiration: 2024-04-11 07:57:14.000000
Token AzureAD/microsoft-authentication-library-for-python#2 expiration: 2024-04-11 07:57:14.000000

So the token is valid for 1 hour.

And both calls to az account get-access-token show this in the debug output, which I think confirms that the ARM token is cached and was originally acquired during az login:

DEBUG: msal.token_cache: event={
    "client_id": "***",
    "data": {
        "claims": "{\"access_token\": {\"xms_cc\": {\"values\": [\"CP1\"]}}}",
        "scope": [
            "https://management.core.windows.net//.default"
        ]
    },
    "environment": "login.microsoftonline.com",
    "grant_type": "client_credentials",
    "params": null,
    "response": {
        "access_token": "********",
        "expires_in": 3599,
        "ext_expires_in": 3599,
        "token_type": "Bearer"
    },
    "scope": [
        "https://management.core.windows.net//.default"
    ],
    "token_endpoint": "https://login.microsoftonline.com/<redacted>/oauth2/v2.0/token"
}

I'm not sure what happens, then... I'll try removing the extra azure/login steps when I get some more time to see if the issue disappears.

Thanks again, let me know if I can perform some more testing if anything comes to mind. If you'd be interested in the debug output, I could send that privately.

jiasli commented 2 months ago

Apologize for the confusion caused.

As I tested today, the mitigation I provided in https://github.com/Azure/azure-cli/issues/28708#issuecomment-2047256166 stopped working for Azure CLI 2.59.0, because of an MSAL regression introduced in 1.27.0 (https://github.com/AzureAD/microsoft-authentication-extensions-for-python/issues/127, https://github.com/AzureAD/microsoft-authentication-library-for-python/pull/644) which is adopted by Azure CLI 2.59.0 (https://github.com/Azure/azure-cli/pull/28556).

This regression makes MSAL's ConfidentialClientApplication bypass msal_extensions.token_cache.PersistedTokenCache, so access tokens are no longer retrieved from the token cache. Instead, every command now retrieves a new access token from the AAD Security Token Service (STS). In fact, not only the mitigation doesn't work, but even ARM commands fail with AADSTS700024 after the ID token expires.

I will work with MSAL on this issue with high priority.

Workaround

For now, please keep using service principal secret for authentication to get unblocked: https://github.com/marketplace/actions/azure-login#login-with-a-service-principal-secret

smokedlinq commented 2 months ago

My question is why this has popped up as an issue recently. We've had pipelines run for well over 20 minutes before and never seen this. But within the last week, it seems any workflow using Azure CLI with OIDC federated auth is experiencing this issue.

Kapsztajn commented 2 months ago

@iamrk04 It looks like your solution is working and I managed to run test normally (pipeline did run over 16 minutes). I have added code which you provide between Azure login and component test:

    - name: Azure login
      uses: azure/login@v2
      with:
        client-id: ${{ env.AZURE_CLIENT_ID }}
        tenant-id: ${{ env.AZURE_TENANT_ID }}
        subscription-id: ${{ env.AZURE_SUBSCRIPTION_ID }}
        enable-AzPSSession: false

    - name: Fetch OID token every 4 mins
      shell: bash
      run: |
        while true; do
          token_request=$ACTIONS_ID_TOKEN_REQUEST_TOKEN
          token_uri=$ACTIONS_ID_TOKEN_REQUEST_URL
          token=$(curl -H "Authorization: bearer $token_request" "${token_uri}&audience=api://AzureADTokenExchange" | jq .value -r)
          az login --service-principal -u ${{ env.AZURE_CLIENT_ID }} -t ${{ env.AZURE_TENANT_ID }} --federated-token $token --output none
          # Sleep for 4 minutes
          sleep 240
        done &

    - name: 'Run tests'
      shell: bash
      ...

I had to add shell: bash because without it I got errors with missing shell.

mderriey commented 2 months ago

My question is why this has popped up as an issue recently. We've had pipelines run for well over 20 minutes before and never seen this. But within the last week, it seems any workflow using Azure CLI with OIDC federated auth is experiencing this issue.

@smokedlinq, In my case, it's due to a new version of the GitHub hosted runner image for ubuntu-latest that was released which has Azure CLI 2.59.0 instead of 2.58.0 for the previous image.

The image went from 20240324.2.0 to 20240407.1.0.

You can see which image your run uses in the "Set up job" step at the very top.

image

smokedlinq commented 2 months ago

@mderriey I assumed something like that, I was more referring to how that broke inside of az.

ant0nsc commented 2 months ago

Hi @Kapsztajn, the suggested mitigation did not work for me as well. It was able to fetch the token with an expiry that was reasonable, but I was able to see the same error once the OID token expired after 5 mins.

I propose a workaround by fetching the OID token every 4 mins to avoid the expiry. I was able to get this working and here is what I did: I inserted the following step in my workflow just before the step where this token expiry issue was popping:

      - name: Fetch OID token every 4 mins
        run: |
          while true; do
            token_request=$ACTIONS_ID_TOKEN_REQUEST_TOKEN
            token_uri=$ACTIONS_ID_TOKEN_REQUEST_URL
            token=$(curl -H "Authorization: bearer $token_request" "${token_uri}&audience=api://AzureADTokenExchange" | jq .value -r)
            az login --service-principal -u ${{ secrets.CLIENT_ID }} -t ${{ secrets.TENANT_ID }} --federated-token $token --output none
            # Sleep for 4 minutes
            sleep 240
          done &

Could you try this out and see if this works for you as well?

Hey @iamrk04, you're a hero! I inserted this snippet into my workflow, and this made it all work. Great idea to just have that run in the background in a shell loop.

For reference: https://github.com/microsoft/hi-ml/pull/925/

avtakkar commented 2 months ago

Hi @Kapsztajn, the suggested mitigation did not work for me as well. It was able to fetch the token with an expiry that was reasonable, but I was able to see the same error once the OID token expired after 5 mins.

I propose a workaround by fetching the OID token every 4 mins to avoid the expiry. I was able to get this working and here is what I did: I inserted the following step in my workflow just before the step where this token expiry issue was popping:

      - name: Fetch OID token every 4 mins
        run: |
          while true; do
            token_request=$ACTIONS_ID_TOKEN_REQUEST_TOKEN
            token_uri=$ACTIONS_ID_TOKEN_REQUEST_URL
            token=$(curl -H "Authorization: bearer $token_request" "${token_uri}&audience=api://AzureADTokenExchange" | jq .value -r)
            az login --service-principal -u ${{ secrets.CLIENT_ID }} -t ${{ secrets.TENANT_ID }} --federated-token $token --output none
            # Sleep for 4 minutes
            sleep 240
          done &

Could you try this out and see if this works for you as well?

Thanks @iamrk04 , this worked for me as well.

nlighten commented 2 months ago

Suggestion from @iamrk04 also worked for me. Wrapped it in a github action that potentially can replace azure/login. I think the solution will even remove the 1 hour limit we had before but have not tested this yet.

name: Azure Federated Login

inputs:
  client-id:
    description: Azure client id
    type: string
  tenant-id:
    description: Azure tenant id
    type: string
  subscription-id:
    description: Azure subscription id
    type: string
    default: none
  refresh-interval-seconds:
    description: Refresh interval in seconds
    type: number
    default: 240

runs:
  using: "composite"
  steps:
    - name: Fetch OID token every ${{ inputs.refresh-interval-seconds }} seconds
      shell: bash
      run: |
        first_time=true
        while true; do
          token=$(curl -s -H "Authorization: bearer ${ACTIONS_ID_TOKEN_REQUEST_TOKEN}" "${ACTIONS_ID_TOKEN_REQUEST_URL}&audience=api://AzureADTokenExchange" | jq .value -r)
          az login --service-principal -u ${{ inputs.client-id }} -t ${{ inputs.tenant-id }} --federated-token $token --output none
          if [ "$first_time" = true ] && [ "${{ inputs.subscription-id }}" != "none" ]; then
            az account set -s ${{ inputs.subscription-id }}
            first_time=false
          fi
          sleep ${{ inputs.refresh-interval-seconds }}
        done &
TomWildenhain commented 2 months ago

I'm running into the same issue in Azure Devops for a pipeline that runs a long python script (2h40m) in an AzureCLI@2 task. Was working fine on Friday (April 5th) but started failing after that with error:

AzureCliCredential: ERROR: AADSTS700024: Client assertion is not within its valid time range. ...

Any ideas on whether an equivalent workaround is possible for Azure Devops to refresh the token every 9 minutes?

dghubble commented 2 months ago

We started having problems with the v2.59.0 az cli and rolled back as a workaround. I'm not sure what about the cli release makes this more/less likely to hit this.

jiasli commented 2 months ago

My question is why this has popped up as an issue recently. We've had pipelines run for well over 20 minutes before and never seen this. But within the last week, it seems any workflow using Azure CLI with OIDC federated auth is experiencing this issue.

@smokedlinq, please refer to my comment https://github.com/Azure/azure-cli/issues/28708#issuecomment-2049400226.

jiasli commented 2 months ago

I propose a workaround by fetching the OID token every 4 mins to avoid the expiry.

This workaround https://github.com/Azure/azure-cli/issues/28708#issuecomment-2049014471 proposed by @iamrk04 of periodically calling az login is not recommended, as Azure CLI doesn't support concurrent execution and you will very likely run into some racing condition (https://github.com/Azure/azure-cli/issues/9427, https://github.com/Azure/azure-cli/issues/20273).

We started having problems with the v2.59.0 az cli and rolled back as a workaround.

This workaround https://github.com/Azure/azure-cli/issues/28708#issuecomment-2050804548 proposed by @dghubble of using an old version is a correct one.

As I suggested in https://github.com/Azure/azure-cli/issues/28708#issuecomment-2049400226, using service principal secret for authentication is also another acceptable workaround.

andre-qumulo commented 2 months ago

@jiasli Service principals are unacceptable for some of us as our security certification would require we rotate them on a regular basis. OIDC does not add that additional burden given that they are clearly short lived.

jiasli commented 2 months ago

Service principals are unacceptable for some of us as our security certification would require we rotate them on a regular basis. OIDC does not add that additional burden given that they are clearly short lived.

@andre-qumulo, we plan to fix the 5-minute expiration issue in the next version of Azure CLI which will be 2.60.0 and released on 2024-04-30. Using a service principal is only a temporary workaround. Secret rotation usually happens on a monthly basis which is far beyond the time we need to fix it.

I have created a separate issue to track it:

TomWildenhain commented 2 months ago

I'm running into the same issue in Azure Devops for a pipeline that runs a long python script (2h40m) in an AzureCLI@2 task. Was working fine on Friday (April 5th) but started failing after that with error:

AzureCliCredential: ERROR: AADSTS700024: Client assertion is not within its valid time range. ...

Any ideas on whether an equivalent workaround is possible for Azure Devops to refresh the token every 9 minutes?

Thanks @jiasli! The mitigation steps for Azure DevOps provided here of using a service principal secret were effective.

(I ran into some trouble finding the organization id while following the instructions but was able to find the organization id with these steps: https://medium.com/@shivapatel1102001/get-list-of-organization-from-azure-devops-microsoft-account-861ea29dae93)

jiasli commented 2 months ago

@TomWildenhain, based on my understanding, the steps provided by https://learn.microsoft.com/en-us/azure/devops/pipelines/library/connect-to-azure?view=azure-devops don't require organization ID when creating a service connection using service principal secret. Could you let me know which article you are following?

geekzter commented 2 months ago

@jiasli Org id is a 1P policy.

TomWildenhain commented 2 months ago

@jiasli Thanks for your help. I was following the instructions in a banner at the top of ADO after creating the manual service connection. The banner states:

Manually created service connections use an App Registration that was created by the user. Please add a federated credential to the App Registration with the following details: Issuer: https://vstoken.dev.azure.com/<org id>, Subject identifier: sc://<org>/<project>/<sc name>. Learn more

With a link to: https://learn.microsoft.com/en-us/azure/devops/pipelines/release/configure-workload-identity?view=azure-devops

I used the instructions to call the API here to get the org id: https://medium.com/@shivapatel1102001/get-list-of-organization-from-azure-devops-microsoft-account-861ea29dae93

jiasli commented 2 months ago

@TomWildenhain, thanks for the information. If you used service principal secret to create the service connection, I don't think the federated identity credential added to the app is actually used.

nlighten commented 2 months ago

@jiasli Is it possible to give any realistic timeline for a fix? I am wondering if it makes sense to ask for a rollback of the cli version contained in actions/runner-images that is used by both Github Actions and Azure DevOps.

pkoushik commented 1 month ago

We are seeing the same issue related to moving away from service principal secrets.

We are looking into adding logic for all Az CLI calls using the ARM token to ensure it gets refreshed (but not as a background process) to get the OIDC token from idToken and reuse it to log in via az account clear && az login ...

MoazzemHossain-bot commented 1 month ago

If you can help to resolve that will be appreciated

panpanwa commented 4 weeks ago

I have exactly the same use case as @TomWildenhain. Is there a way to make the token valid period customable? We can't use Service principal as that's discouraged by the cred free best practices.

Even a workaround would be much appreciated.

jhwj9617 commented 4 weeks ago

Have the same issue for our long-running tasks:

[01:50:31 INF]  ---> (Inner Exception #3) Azure.Identity.CredentialUnavailableException: Azure CLI authentication failed due to an unknown error. See the troubleshooting guide for more information. https://aka.ms/azsdk/net/identity/azclicredential/troubleshoot ERROR: AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-06-01T01:50:31.1765304Z, assertion valid from 2024-06-01T00:49:55.0000000Z, expiry time of assertion 2024-06-01T00:59:55.0000000Z. Review the documentation at https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials . Trace ID: 48af9e38-7793-458d-94af-c2962d617700 Correlation ID: 0f495332-706e-4dba-a18e-1f844f5d7a7d Timestamp: 2024-06-01 01:50:31Z
[01:50:31 INF] Interactive authentication is needed. Please run:
[01:50:31 INF] az login
panpanwa commented 3 weeks ago

@jhwj9617 refer to the solution provided by Kapsztajn and @iamrk04, it works for me too.

Although I believe this is an unnecessary workaround which has to be done by users!

jhwj9617 commented 3 weeks ago

@panpanwa we are not using github actions. We're using AzureDevOps in yml, e.g.

- task: AzureCLI@2
  displayName: Run load profile
  inputs:
    azureSubscription: $(federatedCredConnection)
    scriptType: ps
    scriptLocation: scriptPath
    scriptPath: $(Pipeline.Workspace)/test.ps1
jhwj9617 commented 3 weeks ago

@panpanwa this is the stopgap solution that was shared by a colleague we can implement in our AzureCLI task

Start-Job -Name 'RefreshOidcToken' -ScriptBlock {
  do {
    Get-ChildItem -Path Env: -Recurse -Include ENDPOINT_DATA_* `
      | Select-Object -First 1 -ExpandProperty Name `
      | ForEach-Object { $_.Split("_")[2] } `
      | Set-Variable serviceConnectionId

    $oidcRequestUrl = "${env:SYSTEM_TEAMFOUNDATIONCOLLECTIONURI}${env:SYSTEM_TEAMPROJECTID}/_apis/distributedtask/hubs/build/plans/${env:SYSTEM_PLANID}/jobs/${env:SYSTEM_JOBID}/oidctoken?api-version=7.1-preview.1&serviceConnectionId=${serviceConnectionId}"
    Invoke-RestMethod -Headers @{
      Authorization  = "Bearer $env:SYSTEM_ACCESSTOKEN"
      'Content-Type' = 'application/json'
    } -Uri "${oidcRequestUrl}" -Method Post | Set-Variable oidcTokenResponse

    $oidcToken = $oidcTokenResponse.oidcToken
    if (!$oidcToken) {
      Write-Warning "OIDC token could not be acquired. Retrying..."
      Start-Sleep -Seconds 30
      continue
    }

    az account show -o json | ConvertFrom-Json | Set-Variable account
    az login --service-principal -u $account.user.name --tenant $account.tenantId --allow-no-subscriptions --federated-token $oidcToken | Out-Null

    Start-Sleep -Seconds 480 # 8 minutes
  } while ($true)
} | Tee-Object -Variable refreshOidcTokenJob `
  | Select-Object -ExcludeProperty Command `
  | Write-Host -ForegroundColor DarkMagenta

# do long running work

Receive-Job $refreshOidcTokenJob
Stop-Job -Job $refreshOidcTokenJob
Remove-Job -Job $refreshOidcTokenJob
jhwj9617 commented 3 weeks ago

Also this seems to be in preview for v1.12.0-beta.2 https://github.com/Azure/azure-sdk-for-js/pull/29392

kboom commented 3 weeks ago

@panpanwa this is the stopgap solution that was shared by a colleague we can implement in our AzureCLI task

Start-Job -Name 'RefreshOidcToken' -ScriptBlock {
  do {
    Get-ChildItem -Path Env: -Recurse -Include ENDPOINT_DATA_* `
      | Select-Object -First 1 -ExpandProperty Name `
      | ForEach-Object { $_.Split("_")[2] } `
      | Set-Variable serviceConnectionId

    $oidcRequestUrl = "${env:SYSTEM_TEAMFOUNDATIONCOLLECTIONURI}${env:SYSTEM_TEAMPROJECTID}/_apis/distributedtask/hubs/build/plans/${env:SYSTEM_PLANID}/jobs/${env:SYSTEM_JOBID}/oidctoken?api-version=7.1-preview.1&serviceConnectionId=${serviceConnectionId}"
    Invoke-RestMethod -Headers @{
      Authorization  = "Bearer $env:SYSTEM_ACCESSTOKEN"
      'Content-Type' = 'application/json'
    } -Uri "${oidcRequestUrl}" -Method Post | Set-Variable oidcTokenResponse

    $oidcToken = $oidcTokenResponse.oidcToken
    if (!$oidcToken) {
      Write-Warning "OIDC token could not be acquired. Retrying..."
      Start-Sleep -Seconds 30
      continue
    }

    az account show -o json | ConvertFrom-Json | Set-Variable account
    az login --service-principal -u $account.user.name --tenant $account.tenantId --allow-no-subscriptions --federated-token $oidcToken | Out-Null

    Start-Sleep -Seconds 480 # 8 minutes
  } while ($true)
} | Tee-Object -Variable refreshOidcTokenJob `
  | Select-Object -ExcludeProperty Command `
  | Write-Host -ForegroundColor DarkMagenta

# do long running work

Receive-Job $refreshOidcTokenJob
Stop-Job -Job $refreshOidcTokenJob
Remove-Job -Job $refreshOidcTokenJob

This might work in 99% of the cases but is not completely reliable; beware of race conditions.

jiasli commented 3 weeks ago

Azure DevOps's document now also explains AADSTS700024:

https://learn.microsoft.com/en-us/azure/devops/pipelines/release/troubleshoot-workload-identity

AADSTS700024: Client assertion is not within its valid time range

If the error happens after approximately 1 hour, use a service connection with Workload identity federation and a Managed Identity instead. Managed Identity tokens have a lifetime of around 24 hours. If the error happens before 1 hour but after 10 minutes, move commands that (implicitly) request an access token to e.g. access Azure storage to the beginning of your script. The access token will be cached for subsequent commands.

jluongh commented 2 weeks ago

Do we have any updates on the issue? A lot of our ADO pipelines are intermittently failing and we have been asked to move away from service principals to be cred free.

The PR linked is still in draft state https://github.com/Azure/azure-cli/pull/28778

TomWildenhain commented 1 week ago

Azure DevOps's document now also explains AADSTS700024:

https://learn.microsoft.com/en-us/azure/devops/pipelines/release/troubleshoot-workload-identity

AADSTS700024: Client assertion is not within its valid time range If the error happens after approximately 1 hour, use a service connection with Workload identity federation and a Managed Identity instead. Managed Identity tokens have a lifetime of around 24 hours. If the error happens before 1 hour but after 10 minutes, move commands that (implicitly) request an access token to e.g. access Azure storage to the beginning of your script. The access token will be cached for subsequent commands.

Thanks @jiasli! This works for my use case!

nrv-96 commented 5 days ago

I got same error for the time duration between 10 min to 1 hour, as mentioned on the Microsoft Docs as mentioned in the docs we have access storage account at beginning but in terraform apply we cannot manage by ourselves.

I'm using terraform apply the pipeline running around 10 min and then gives below error:

error loading state: Error retrieving keys for Storage Account "teestmgmt": autorest/Client#Do: Preparing request failed: StatusCode=0 -- Original Error: clientCredentialsToken: received HTTP status 401 with response: {"error":"invalid_client","error_description":"AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-06-21T10:41:11.0510669Z, assertion valid from 2024-06-19T02:32:13.0000000Z, expiry time of assertion 2024-06-19T02:42:13.0000000Z. Review the documentation at https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials . Trace ID: Correlation ID: Timestamp: 2024-06-21 10:41:11Z","error_codes":[700024],"timestamp":"2024-06-21 10:41:11Z","trace_id":"","correlation_id":"","error_uri":"https://login.microsoftonline.com/error?code=700024"}