Azure / login

Connect to Azure
MIT License
286 stars 268 forks source link

ERROR: AADSTS700024: Client assertion is not within its valid time range #372

Open krukowskid opened 7 months ago

krukowskid commented 7 months ago

Hi! I am facing a similar issue (#180) that appears to have been resolved, but I'm still encountering this problem when executing dotnet tests in GitHub Runner.

Azure.Identity.CredentialUnavailableException : DefaultAzureCredential failed to retrieve a token from the included credentials. See the troubleshooting guide for more information. https://aka.ms/azsdk/net/identity/defaultazurecredential/troubleshoot
...
- Azure CLI authentication failed due to an unknown error. See the troubleshooting guide for more information. https://aka.ms/azsdk/net/identity/azclicredential/troubleshoot ERROR: AADSTS700024: Client assertion is not within its valid time range. Current time: 2023-10-31T11:53:04.4424859Z, assertion valid from 2023-10-31T11:39:49.0000000Z, expiry time of assertion 2023-10-31T11:44:49.0000000Z. Review the documentation at https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials . Trace ID: d64c537e-1d94-4274-9012-c0d7590f1c00 Correlation ID: 5c769bb7-e85a-4557-ba28-92f8eca1c4ff Timestamp: 2023-10-31 11:53:04Z
    Interactive authentication is needed. Please run:
    az login

I'm using action version 1.4.6 and azure.identity package version 1.10.4 + DefaultAzureCredential(). The issue doesn't occur on integration tests where nearly all of them utilize tokens. However, if I run API/UI tests where I employ identity in one or two tests, it fails with above error. Do you have any suggestions or workarounds?

YanaXu commented 7 months ago

Hi @krukowskid , could you provide the workflow file, run it again with debug mode, and provide the debug log?

benjamin-rousseau-shift commented 7 months ago

same issue here this is a real pain. The token are only valid for 5 minutes, and if you don't use it until very far in your workflow, then it just throw the error shown by OP

I tried azure/login@1.5.0 same issue. I'm not using any other way to login into azure.

YanaXu commented 7 months ago

same issue here this is a real pain. The token are only valid for 5 minutes, and if you don't use it until very far in your workflow, then it just throw the error shown by OP

I tried azure/login@1.5.0 same issue. I'm not using any other way to login into azure.

Hi @benjamin-rousseau-shift could you provide your workflow file and debug log? Do you also use OIDC login? OIDC login with SP should have an expiration of 1 hour and OIDC with User-assigned managed identity should have 24 hours.

benjamin-rousseau-shift commented 7 months ago

I will try to give you that , I am using OIDC with a service principal using federated credentials.

krukowskid commented 7 months ago

@YanaXu

here is my workflow definition (its reusable workflow). I have also enabled debug but it doesnt make sense to paste it here because it's so noisy. Workflow is failing in 🧪 Run tests for specified filter and rerun failed step. I will provide debug logs, just let me know which part/step you are interested in

reusable workflow definition ```yaml name: 'reusable/run-tests' on: workflow_call: inputs: environment: required: true type: string system-under-test: required: false type: string default: xwow test-configuration: required: true type: string tests-filter: description: 'Filter for selecting tests to run' required: true type: string tests-web-url: required: false type: string tests-apigateway-url: required: false type: string report-name: description: 'Name for execution report and attachments' required: false default: Default type: string allure-reports: required: false default: false type: boolean allure-project-id: required: false type: string secrets: KrukowskidBotAppId: required: false KrukowskidBotPrivateKey: required: false ad-username: required: false ad-password: required: false azure-client-id: required: false azure-tenant-id: required: false azure-subscription-id: required: false identity-url: required: false identity-client-id: required: false backoffice-identity-url: required: false backoffice-client-id: required: false backoffice-client-secret: required: false backoffice-identity-scope: required: false allure-server-password: required: false permissions: id-token: write contents: write actions: read checks: write jobs: run-tests: name: run-tests environment: ${{ inputs.environment }} runs-on: labels: ubuntu-latest-8core32ram timeout-minutes: 20 env: E2E-ENVIRONMENT: ${{ inputs.test-configuration }} E2E-SUT: ${{ inputs.system-under-test }} ALLURE_SERVER_URL: ${{ vars.ALLURE_SERVER_URL }} ALLURE_SERVER_USER: ${{ vars.ALLURE_SERVER_USER }} ALLURE_SERVER_PASSWORD: ${{ secrets.allure-server-password }} defaults: run: shell: pwsh steps: - name: Generate token if: ${{ github.repository != 'Krukowskid/Krukowskid.Tests' }} id: generate_token uses: tibdex/github-app-token@v1 with: app_id: ${{ secrets.KrukowskidBotAppId }} private_key: ${{ secrets.KrukowskidBotPrivateKey }} - name: Checkout Tests if: ${{ github.repository != 'Krukowskid/Krukowskid.Tests' }} uses: actions/checkout@v3 with: repository: Krukowskid/Krukowskid.Tests token: "${{ steps.generate_token.outputs.token }}" ref: main - name: Checkout Tests if: ${{ github.repository == 'Krukowskid/Krukowskid.Tests' }} uses: actions/checkout@v3 - name: Azure login uses: Azure/login@v1.4.6 with: client-id: ${{ secrets.azure-client-id }} tenant-id: ${{ secrets.azure-tenant-id }} subscription-id: ${{ secrets.azure-subscription-id }} - name: Setup .NET uses: actions/setup-dotnet@v3 with: dotnet-version: 7.0.x - name: Check Other Chrome Version run: /usr/bin/google-chrome --version - name: Restore dependencies run: dotnet restore src - name: List Config Files run: ls src/Krukowskid.Tests.Common/Krukowskid.Tests.Common.Configuration - name: Add TestResults dir run: | mkdir src/TestAutomation mkdir src/TestAutomation/TestResults mkdir src/TestAutomation/TestResults/AllureReports - name: 🦿 Override WebUrl if: ${{ inputs.tests-web-url != '' }} shell: bash --noprofile --norc {0} run: | echo "Setting E2E_TESTS__WEB__URL env var to ${{ inputs.tests-web-url }}" echo "E2E_TESTS__WEB__URL=${{ inputs.tests-web-url }}" >> $GITHUB_ENV - name: 🦿 Override ApiGatewayUrl if: ${{ inputs.tests-apigateway-url != '' }} shell: bash --noprofile --norc {0} run: | echo "Setting E2E_TESTS__APIGATEWAY__URL env var to ${{ inputs.tests-apigateway-url }}" echo "E2E_TESTS__APIGATEWAY__URL=${{ inputs.tests-apigateway-url }}" >> $GITHUB_ENV - name: 🏗 Build run: dotnet build src --no-restore - name: List Files run: | ls src -lR > src/TestAutomation/TestResults/post-build-files.txt ls ${{ github.workspace }} - name: 🦾 Install browser for Playwright tests shell: pwsh run: src/Krukowskid.Tests.UI/Krukowskid.Tests.UI.x/bin/Debug/net7.0/playwright.ps1 install --with-deps chromium - name: 🧪 Run tests for specified filter and rerun failed shell: bash --noprofile --norc {0} env: LC_ALL: en_US.utf8 run: | counter=1 exitcode=0 reset="\e[0m" warn="\e[0;33m" green="\e[0;92m" blue="\e[0;94m" while [ $counter -lt 4 ] do if [ $filter ] then echo -e "${warn}Run number: $counter. Re-running failed tests filter: $filter ${reset}" # run test and forward output also to a file in addition to stdout (tee command) cp src/TestAutomation/TestResults/runtestsoutput.log src/TestAutomation/TestResults/runtestsoutput_first.log dotnet test src --no-build --filter=$filter --verbosity minimal --logger trx --results-directory src/TestAutomation/TestResults --settings:src/Krukowskid.Tests.Common/Krukowskid.Tests.Common.Configuration/cicd.runsettings | tee src/TestAutomation/TestResults/runtestsoutput.log else echo -e "${blue}First run. Running tests with filter "${{ inputs.tests-filter }}" ${reset}" dotnet test src --no-build --filter "${{ inputs.tests-filter }}" --verbosity minimal --logger trx --results-directory src/TestAutomation/TestResults --settings:src/Krukowskid.Tests.Common/Krukowskid.Tests.Common.Configuration/cicd.runsettings | tee src/TestAutomation/TestResults/runtestsoutput.log fi # capture dotnet test exit status, different from tee exitcode=${PIPESTATUS[0]} if [ $exitcode == 0 ] then echo -e "${green}Running tests succeeded after $counter attempts.${reset}" exit 0 fi filter=$(cat src/TestAutomation/TestResults/runtestsoutput.log | grep -o -P '(?<=\sFailed\s)\w*'| grep -v -x 'Krukowskid' | awk 'BEGIN { ORS="|" } { print("Name=" $0) }' | grep -o -P '.*(?=\|$)') ((counter++)) done exit $exitcode - name: List Files if: always() run: ls src -lR > src/TestAutomation/TestResults/post-tests-files.txt - name: 📈 Generate Github Report uses: dorny/test-reporter@v1 if: always() with: name: ${{ inputs.report-name }} Test Execution Report path: 'src/TestAutomation/TestResults/*.trx' reporter: 'dotnet-trx' list-suites: 'all' fail-on-error: 'false' - name: Find Allure Reports if: ${{ always() && inputs.allure-reports == true }} shell: bash run: | find src -type d -name "allure-results" - name: Copy Allure Reports if: ${{ always() && inputs.allure-reports == true }} shell: bash run: | find src -type d -name "allure-results" -exec cp -r -v {}/. src/TestAutomation/TestResults/AllureReports \; - name: 📈 Upload Allure Reports uses: unickq/send-to-allure-docker-service-action@v1 if: ${{ always() && github.ref_name == 'main' && inputs.allure-reports == true }} continue-on-error: true with: allure_results: src/TestAutomation/TestResults/AllureReports project_id: ${{ inputs.allure-project-id }} auth: true generate: true - name: Upload additional reports uses: actions/upload-artifact@v3 if: always() with: name: ${{ inputs.report-name }}TestReports path: | src/TestAutomation src/**/TestResults src/**/bin/**/allureConfig.json src/**/bin/**/appSettings.*.json ```
YanaXu commented 7 months ago

Hi @krukowskid , From the description of this issue, I see the error is thrown from Azure CLI. But in the steps of "reusable workflow definition", I can't tell which step throws the exception. Could you answer these questions for the further analysis?

krukowskid commented 7 months ago

Its thrown in dotnet tests (🧪 Run tests for specified filter and rerun failed step) that are using DefaultAzureCredential()

image

Its github hosted (large) runner., same problem on ubuntu-latest

same as on ubuntu-latest

on the day i was creating an issue 1.4.6 was the latest. I will try with 1.5.0

YanaXu commented 7 months ago

@krukowskid, Azure Login Action works for Azure CLI and Azure PowerShell. But in your workflow file, Run tests for specified filter and rerun failed only call dotnet commands. Do you mean the error is thrown for your c# source code? Have you checked the code if they run the auth independently without Azure CLI?

krukowskid commented 7 months ago

I am using DefaultAzureCredential. Locally (with visualstudioidentity) it works, it also works with azure login action with secret

YanaXu commented 7 months ago

@krukowskid , What I can see from Run tests for specified filter and rerun failed is the workflow file tries to run "dotnet test". I don't know what's inside. Azure Login Action supports Azure CLI and Azure PowerShell. If it's pure c# test codes, I don't think it'll work. If the tests call Azure CLI or Azure PowerShell, it's another story. Can you share more details with us?

krukowskid commented 7 months ago

In dotnet code I am using DefaultAzureCredential from Azure.Identity package. During authentication it loops trough all possible methods of authentication. When running test on runner it's using AzureCliCredential with CLI context set on runner by azure/login action

shaneholder commented 7 months ago

Sticking my me too on this problem, exactly the same error message and reporting of a 5 minute token. Out of curiosity, is there a point where the v1 tag should be dropped back to a previously working commit in order to avoid lots of issues? I know that best practice is that workflows should us commit hashes instead of tags when referencing actions but I'm sure there are lots of workflows that don't.

YanaXu commented 6 months ago

Sticking my me too on this problem, exactly the same error message and reporting of a 5 minute token. Out of curiosity, is there a point where the v1 tag should be dropped back to a previously working commit in order to avoid lots of issues? I know that best practice is that workflows should us commit hashes instead of tags when referencing actions but I'm sure there are lots of workflows that don't.

Hi @shaneholder could you please provide more details about your issues? As we know, v1.5.1 will not introduce the issues like this. We're trying to reproduce this issue and figure out how it happens now. FYI, we would drop back v1 to an older version if the latest version truely introduces some issues, e.g. #371 . However, about moving the v1 to the latest version or not, everyone has different opinions, e.g. #380. Let's focus on this issue itself. Please help us to provide more details to reproduce it. If it's indeed an issue, we'll take the right action on it.

benjamin-rousseau-shift commented 6 months ago

I don't know why but I can't replicate it anymore. However if you are still curious on how my workflow looks like :

name: Test Workflow for Debugging Azure Cli Credentials Timeout

on:
  workflow_dispatch:

permissions:
  id-token: write
  contents: read

jobs:
  azure:
    name: "Testing Azure Cli Timeout"
    runs-on: [self-hosted, linux, x64] # ubuntu-latest
    environment: Production
    steps:
      - name: Install Azure cli
        run: |
          sudo apt-get install ca-certificates curl apt-transport-https lsb-release gnupg -y
          curl -sL https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/microsoft.gpg > /dev/null
          AZ_REPO=$(lsb_release -cs)
          echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $AZ_REPO main" | sudo tee /etc/apt/sources.list.d/azure-cli.list
          sudo apt-get update
          sudo apt-get install azure-cli

      - name: Az CLI login
        uses: azure/login@v1
        with:
          client-id: ${{ vars.AZURE_CLIENT_ID }}
          tenant-id: ${{ vars.AZURE_TENANT_ID }}
          allow-no-subscriptions: true

      - name: Sleep for 10 minutes
        run: sleep 600

      - name: Az CLI Account Show
        run: az account show

what I'm suspecting is that for the ubuntu runner we are using, azure cli might have been updated ? (I'm not sure which version of ubuntu we are running, but it might be that azure cli latest was not yet the right version for our distrib ?)

benjamin-rousseau-shift commented 6 months ago

Scratch that I actually still face it, but my real pipeline is a bit different as it also install azure-cli-core using pip3 for some requirements with the azure ansible collection.

I wonder if it's the azure-cli-core (2.34.0) that messes up with the token expiration even though I login with the action before even installing this azure-cli-core, I am lost.

EDIT: it's not, I tested by forcing the installation of 2.55.0 with pip3 and still the same thing. I'm trying some more workflows to see if I can replicate in an isolated environment

4c74356b41 commented 6 months ago

@benjamin-rousseau-shift i think the issue is with the underlying OIDC token issued by Github (5 minutes expiry). it seems like its not a fault of Azure Cli. I've started having issues similar to yours after migrating to federated identity. I solved them:

https://stackoverflow.com/questions/77686072/issues-with-azure-identity-when-using-federated-credentials

I'm using python, but you can implement this fix in any other language:

def get_azure_credentials():
    token_request = os.environ.get("ACTIONS_ID_TOKEN_REQUEST_TOKEN")
    token_uri = os.environ.get("ACTIONS_ID_TOKEN_REQUEST_URL")
    subprocess_helper(f'token=$(curl -H "Authorization: bearer {token_request}" "{token_uri}&audience=api://AzureADTokenExchange" | jq .value -r) && az login --service-principal -u {CLIENT_ID} -t {TENANT_ID} --federated-token $token')
    return AzureCliCredential()
benjamin-rousseau-shift commented 6 months ago

@4c74356b41 By doing this I think you're basically doing exactly the same thing as the github action. My workaround for now is to azure login again (just like you do in your python script) right before I need to fetch something from azure. Not the fanciest solution but yeah the OIDC token are only valid 5 minutes that's a fact no matter what the documentation is saying :/

MoChilia commented 6 months ago

@4c74356b41 @benjamin-rousseau-shift, you are right. The GitHub OIDC provider issues a JWT ID token with a 5-minute expiration time. Its lifespan is not officially documented. By decoding the OIDC token, we can find it is actually expired in 5 minutes. You can also verify this in the sample token.

During login, Azure CLI will use the GitHub OIDC token to fetch an access token from MSAL. This access token will be stored in msal_token_cache. This access token is assigned a random value ranging between 60-90 minutes (75 minutes on average). See https://learn.microsoft.com/en-us/entra/identity-platform/access-tokens#access-token-lifetime.

AzureCliCredential() authenticates by requesting a token from the Azure CLI. The instantiation of AzureCliCredential() alone will not raise the error. The error should occur when calling its method get_token(). It executes az account get-access-token --output json --resource {} to request a token from Azure CLI. See https://github.com/Azure/azure-sdk-for-python/blob/6aa171f81c0111996a2785b14864e961a7942e87/sdk/identity/azure-identity/azure/identity/_credentials/azure_cli.py#L24.

For az account get-access-token, Azure CLI first calls acquire_token_silent to attempt to get an access token from token cache. If no access token is returned, it calls acquire_token_for_client to get a new access token with client assertion in OIDC scenario, see https://github.com/Azure/azure-cli/issues/13276#issuecomment-1301828386.

Regarding @krukowskid's issue, the error ERROR: AADSTS700024: Client assertion is not within its valid time range. is most likely because DefaultAzureCredential fails to find or accept the access token in token cache and attempts to fetch a new access token again. At this point, the GitHub OIDC token is expired and cannot be used to fetch an access token.

In my local testing, it works seamlessly under normal conditions, returning the access token from the cache without needing to fetch a new access token from MSAL. I am wondering if you use GetToken() to issue a different scope from the access token stored in token cache. You may double check the TokenRequestContext argument for DefaultAzureCredential().GetToken().

4c74356b41 commented 6 months ago

not sure if I'm interpreting what you say right. basically what you are saying that the default token in token cache should still be valid for 75 minutes on average and if we somehow retrieve that it should work (even though OIDC token expired)?

MoChilia commented 6 months ago

@4c74356b41, you're correct. Azure CLI stores the access token fetched from MSAL, which is valid for 75 minutes on average. If you are trying to retrieve this token from cache, it should work without the need of OIDC token. But if you are retrieving a new access token from remote MSAL, it needs OIDC token.

4c74356b41 commented 6 months ago

mkay, can you, please, help me understand how to reliably request token from the cache and not a new token? thanks!

MoChilia commented 6 months ago

@4c74356b41, I tried the following python code, it will return the token form cache if it is still valid.

from azure.identity import AzureCliCredential
azure_cli_credential = AzureCliCredential()
print("AzureCliCredential: ", azure_cli_credential.get_token("https://management.core.windows.net/"))
4c74356b41 commented 6 months ago

thats what i was using and its definitely isnt working with OIDC

mac2000 commented 3 months ago

Just faced similar issue

In my case - workflow is quite long running scheduled job to cleanup some unwanted images from azure container registry

Here is workflow file, nothing fancy inside, technically it has only two moving parts:

  1. azure login
  2. run powershell script
cleanup.yml ```yml name: cleanup on: workflow_dispatch: env: ARM_CLIENT_ID: 000000000-0000-0000-0000-000000000000 ARM_USE_OIDC: true permissions: contents: read id-token: write jobs: cleanup: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: azure/login@v1 with: client-id: 000000000-0000-0000-0000-000000000000 tenant-id: 000000000-0000-0000-0000-000000000000 subscription-id: 000000000-0000-0000-0000-000000000000 - run: pwsh cleanup.ps1 ```

the script itself is something like this (stripped out all irrelevant details) aka it is iterating over images and deletes them from container registry

$ErrorActionPreference = "Stop"

$registry = 'demo'
az acr login -n $registry

# Step 1: retrieve images
# pretend we received images here
$used = @('demo.azurecr.io/foo:latest', 'demo.azurecr.io/bar:1.2.0')

# Step 2: delete images
$counter = 0
foreach ($image in $items) {
  try {
    az acr repository delete -n $registry --image $image --yes --only-show-errors
    Write-Host "$image - deleted" -ForegroundColor Green
    $counter += 1
  }
  catch {
    Write-Host "$image - failed" -ForegroundColor Red
  }
  # ♻️ workaround - manually refresh token
  if ($env:ARM_CLIENT_ID -and $counter % 100 -eq 0) {
    az login --service-principal -u $env:ARM_CLIENT_ID -t (az account show --query tenantId -o tsv) --federated-token (Invoke-RestMethod -Uri "$($env:ACTIONS_ID_TOKEN_REQUEST_URL)&audience=api://AzureADTokenExchange" -Headers @{Authorization = "Bearer $($env:ACTIONS_ID_TOKEN_REQUEST_TOKEN)" } | Select-Object -ExpandProperty value)
  }
}

as you can guess because it is deleting images one by one it took some time, definitely more than 5 minutes, in my case job took 2 hour

so after a while all attempts to delete images are failed with following error:

ERROR: AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-03-14T16:07:58.2005292Z, assertion valid from 2024-03-14T15:12:23.0000000Z, expiry time of assertion 2024-03-14T15:17:23.0000000Z. Review the documentation at https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials . Trace ID: 849defde-0aa5-4a2f-a30d-ec73d2266000 Correlation ID: 9706d64c-2538-4e10-8808-cb3f37cb0a93 Timestamp: 2024-03-14 16:07:58Z
  Interactive authentication is needed. Please run:
  az login

so i was wondering if there is a some kind of workaround, aka az refresh or something like that 🤔

and many thanks to @4c74356b41 for pointing me out - there is, added an example of how it may be done in powershell

4c74356b41 commented 3 months ago

use this work around detailed previously:

token=$(curl -H "Authorization: bearer {token_request}" "{token_uri}&audience=api://AzureADTokenExchange" | jq .value -r) 
az login --service-principal -u {CLIENT_ID} -t {TENANT_ID} --federated-token $token')

you can create a timer to call this every 5 minutes or you can simply do this every iteration (or every other iteration, etc)

you can also use runspaces to finish everything 10x faster or smth

mderriey commented 2 months ago

We've recently been experiencing this issue, it was working fine before, and no changes have been made to the workflow.

Setup:

We noticed that the issue arose when the GitHub hosted runner image went from 20240324.2.0 to 20240407.1.0. The PR shows that the Azure CLI was updated from 2.58.0 to 2.59.0, see https://github.com/actions/runner-images/pull/9656/files#diff-66aec6097318276b09842a3ba2caf3037afbd8dadca2dfcdf76631100613ea69R111.

I'm not aware of nice workarounds for now, so I'll add more azure/login steps...

Kaloszer commented 2 months ago

Same here, now experiencing it way more often... gotta put in more login steps. Azure is slow with deploying some resources and it's just a pain in the ... to have to relog for every action.

Workaround in pwsh


                Write-Verbose -Verbose "Force refresh token" # https://github.com/Azure/login/issues/372
                $uri = "$($ENV:ACTIONS_ID_TOKEN_REQUEST_URL)&audience=api://AzureADTokenExchange"
                $reqToken = "bearer $($ENV:ACTIONS_ID_TOKEN_REQUEST_TOKEN)"

                Write-Verbose -Verbose "Get token"
                $token = Invoke-RestMethod -Method GET -Uri "$($uri)&audience=api://AzureADTokenExchange" -Headers @{ "Authorization" = "$($reqToken)" } | Select-Object -ExpandProperty value
                Write-Verbose -Verbose "Login"
                az login --service-principal -u REPLACE_W_CLIENTID -t REPLACE_W_TENANTID --federated-token $token
jiasli commented 2 months ago

I am the developer of Azure CLI for federated identity credential support. Please see https://github.com/Azure/azure-cli/issues/28708#issuecomment-2047256166 for a temporary mitigation to extend the task duration to 60 minutes.

ant0nsc commented 2 months ago

@jiasli, thanks for suggesting this workaround. I tried your suggestion in my pipeline, but still run into the same issue as before. Example run: https://github.com/microsoft/hi-ml/actions/runs/8642139946/job/23692828663, using the workflow updated like this: https://github.com/microsoft/hi-ml/pull/925/

Roughly speaking, in our test suite, we repeatedly run tests that

Despite having added various different scoped access tokens, I always eventually hit a token expiry problem

nlighten commented 2 months ago

A nice solution with automatic periodic refresh has been suggested in https://github.com/Azure/azure-cli/issues/28708#issuecomment-2049014471 which you can wrap in a custom github action like show below. Can potentially be used as a temporary replacement of this action for long running workflows.

name: Azure Federated Login

inputs:
  client-id:
    description: Azure client id
    type: string
  tenant-id:
    description: Azure tenant id
    type: string
  subscription-id:
    description: Azure subscription id
    type: string
    default: none
  refresh-interval-seconds:
    description: Refresh interval in seconds
    type: number
    default: 240

runs:
  using: "composite"
  steps:
    - name: Fetch OID token every ${{ inputs.refresh-interval-seconds }} seconds
      shell: bash
      run: |
        first_time=true
        while true; do
          token=$(curl -s -H "Authorization: bearer ${ACTIONS_ID_TOKEN_REQUEST_TOKEN}" "${ACTIONS_ID_TOKEN_REQUEST_URL}&audience=api://AzureADTokenExchange" | jq .value -r)
          az login --service-principal -u ${{ inputs.client-id }} -t ${{ inputs.tenant-id }} --federated-token $token --output none
          if [ "$first_time" = true ] && [ "${{ inputs.subscription-id }}" != "none" ]; then
            az account set -s ${{ inputs.subscription-id }}
            first_time=false
          fi
          sleep ${{ inputs.refresh-interval-seconds }}
        done &
m-soltani commented 2 months ago

The temporary solution does not work when using packer azure provider in hcl templates. In our case we use packer templates to create custom Azure VM Images with integrated use_azure_cli_auth: true as the mode of authentication.


source "azure-arm" "image" {
  location                               = "${var.location}"

  // Auth
  use_azure_cli_auth                     = true
  subscription_id                        = "${var.subscription_id}"

  // Rest omitted.

}

the process takes 6 hours to create fresh VM images and at the end of script when packer wants to create the final image in the azure gallery, we receive the same error:


==> azure-arm.image: authorizing request: running Azure CLI: exit status 1: ERROR: AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-04-12T06:48:06.1011631Z, assertion valid from 2024-04-12T01:04:10.0000000Z, expiry time of assertion 2024-04-12T01:14:10.0000000Z. Review the documentation at https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials . Trace ID: bcaf1c3c-98f2-4cb3-b0db-61aa68f15701 Correlation ID: b3cc18c0-bc01-403e-9d49-a119ac9bbc46 Timestamp: 2024-04-12 06:48:06Z
==> azure-arm.image: Interactive authentication is needed. Please run:
==> azure-arm.image: az login

Sorry to mention you @jiasli: Does your fix takes in to account such scenarios as well? basically long-running pipelines (up to 6 hours) by refreshing the access token in background by providing refresh_tokens and get access_tokens in turn?

MoChilia commented 2 months ago

:heavy_exclamation_mark: :heavy_exclamation_mark: :heavy_exclamation_mark:If you are encountering ERROR: AADSTS700024: Client assertion is not within its valid time range, here are the workarounds for four scenarios:

  1. If your workflow fails after 5 minutes recently with azure-cli on your runner upgraded to 2.59.0:

        jobs:
          linux-regression:
            runs-on: ubuntu-latest
            steps:
               - name: uninstall azure-cli 
                 run: |
                    sudo apt-get remove -y azure-cli
               - name: install azure-cli 2.58.0
                 run: |
                    sudo apt-get update
                    sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release
                    sudo mkdir -p /etc/apt/keyrings
                    curl -sLS https://packages.microsoft.com/keys/microsoft.asc |
                        sudo gpg --dearmor -o /etc/apt/keyrings/microsoft.gpg
                    sudo chmod go+r /etc/apt/keyrings/microsoft.gpg
                    AZ_DIST=$(lsb_release -cs)
                    echo "Types: deb
                    URIs: https://packages.microsoft.com/repos/azure-cli/
                    Suites: ${AZ_DIST}
                    Components: main
                    Architectures: $(dpkg --print-architecture)
                    Signed-by: /etc/apt/keyrings/microsoft.gpg" | sudo tee /etc/apt/sources.list.d/azure-cli.sources
                    AZ_VER=2.58.0
                    sudo apt-get update && sudo apt-get install azure-cli=${AZ_VER}-1~${AZ_DIST}
               - name: check azure-cli version
                 run: |
                    az --version
    • Downgrade azure-cli on Windows runners:
      jobs:
        windows-regression:
          runs-on: windows-latest
          steps:
             - name: uninstall azure-cli 
               run: |
                  Start-Process msiexec.exe -Wait -ArgumentList '/x {DEFB65A7-FD02-4710-B01E-6C9387982CA9} /quiet'
             - name: install azure-cli 2.58.0
               run: |
                  $ProgressPreference = 'SilentlyContinue'; Invoke-WebRequest -Uri https://azcliprod.blob.core.windows.net/msi/azure-cli-2.58.0-x64.msi -OutFile .\AzureCLI.msi; Start-Process msiexec.exe -Wait -ArgumentList '/I AzureCLI.msi /quiet'; Remove-Item .\AzureCLI.msi
             - name: check azure-cli version
               run: |
                  az --version

    Note that downgrading Azure CLI may take some time to finish. But this workaround is only necessary until Azure CLI 2.60.0 is released.

  2. If your workflow fails after 5 minutes also in azure-cli <= 2.58.0:

    • This is because there is no access token for your requested scope in the token cache, Azure CLI will try to get the access token with the GitHub ID token. However, as the ID token has expired after 5 minutes, you will encounter ERROR: AADSTS700024. See https://github.com/Azure/azure-cli/issues/28708#issuecomment-2047256166.
    • It is expected to be solved after azure-cli supports ID token refresh.

      Workaround: Request access token with all your required scopes within 5 minutes. Here are the most popular requested scopes. Modify the script according to your request.

      - uses: azure/cli@v2
      with:
        azcliversion: 2.58.0
        inlineScript: |
            # Storage:
            az account get-access-token --scope https://storage.azure.com/.default --output none 
            # Key Vault: 
            az account get-access-token --scope https://vault.azure.net/.default --output none
            # Microsoft Graph: 
            az account get-access-token --scope https://graph.microsoft.com/.default --output none
            # Kusto: 
            az account get-access-token --scope https://kusto.kusto.windows.net/.default --output none
  3. If your workflow fails after 60 minutes: This is because azure-cli can only request an access token with a lifetime of 60 minutes. But ID token has expired after 5 minutes, azure-cli cannot get a new access token after 60 minutes. It is expected to be solved after azure-cli supports ID token refresh.

    Workaround: Use user managed identities with OIDC, instead of using service principals The token lifetime of managed Identities would be 24 hours, see Managed identities tokens cache. This can cover the lifetime for most of the CI/CD workflows.

  4. If your workflow fails after 5 minutes with azure-powershell < 9.2: This is the scenario what #180 talks. It's fixed in Azure PowerShell v9.2 (released on 12/6/2022). See https://github.com/Azure/login/issues/180#issuecomment-1524995605.

Check your scenario and use the provided workaround. We're actively working to resolve this issue. Thank you for your understanding.

4c74356b41 commented 1 month ago

❗ ❗ ❗If you are encountering ERROR: AADSTS700024: Client assertion is not within its valid time range, here are the workarounds for four scenarios:

  1. If your workflow fails after 5 minutes recently with azure-cli on your runner upgraded to 2.59.0:

    Workaround: Downgrade azure-cli to 2.58.0. Following are the scripts to downgrade the azure-cli version on your agent.

    • If you are using azure/cli action, specify azcliversion with an older version of Azure CLI below 2.59.0, such as 2.58.0.
     - uses: azure/cli@v2
       with:
         azcliversion: 2.58.0
         inlineScript: |
           az --version
    • If you are using other actions depending on azure-cli, downgrade azure-cli on Linux runners:
       jobs:
         linux-regression:
           runs-on: ubuntu-latest
           steps:
              - name: uninstall azure-cli 
                run: |
                   sudo apt-get remove -y azure-cli
              - name: install azure-cli 2.58.0
                run: |
                   sudo apt-get update
                   sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release
                   sudo mkdir -p /etc/apt/keyrings
                   curl -sLS https://packages.microsoft.com/keys/microsoft.asc |
                       sudo gpg --dearmor -o /etc/apt/keyrings/microsoft.gpg
                   sudo chmod go+r /etc/apt/keyrings/microsoft.gpg
                   AZ_DIST=$(lsb_release -cs)
                   echo "Types: deb
                   URIs: https://packages.microsoft.com/repos/azure-cli/
                   Suites: ${AZ_DIST}
                   Components: main
                   Architectures: $(dpkg --print-architecture)
                   Signed-by: /etc/apt/keyrings/microsoft.gpg" | sudo tee /etc/apt/sources.list.d/azure-cli.sources
                   AZ_VER=2.58.0
                   sudo apt-get update && sudo apt-get install azure-cli=${AZ_VER}-1~${AZ_DIST}
              - name: check azure-cli version
                run: |
                   az --version
    • Downgrade azure-cli on Windows runners:
     jobs:
       windows-regression:
         runs-on: windows-latest
         steps:
            - name: uninstall azure-cli 
              run: |
                 Start-Process msiexec.exe -Wait -ArgumentList '/x {DEFB65A7-FD02-4710-B01E-6C9387982CA9} /quiet'
            - name: install azure-cli 2.58.0
              run: |
                 $ProgressPreference = 'SilentlyContinue'; Invoke-WebRequest -Uri https://azcliprod.blob.core.windows.net/msi/azure-cli-2.58.0-x64.msi -OutFile .\AzureCLI.msi; Start-Process msiexec.exe -Wait -ArgumentList '/I AzureCLI.msi /quiet'; Remove-Item .\AzureCLI.msi
            - name: check azure-cli version
              run: |
                 az --version

    Note that downgrading Azure CLI may take some time to finish. But this workaround is only necessary until Azure CLI 2.60.0 is released.

  2. If your workflow fails after 5 minutes also in azure-cli <= 2.58.0:

    Workaround: Request access token with all your required scopes within 5 minutes. Here are the most popular requested scopes. Modify the script according to your request.

     - uses: azure/cli@v2
       with:
         azcliversion: 2.58.0
         inlineScript: |
             # Storage:
             az account get-access-token --scope https://storage.azure.com/.default --output none 
             # Key Vault: 
             az account get-access-token --scope https://vault.azure.net/.default --output none
             # Microsoft Graph: 
             az account get-access-token --scope https://graph.microsoft.com/.default --output none
             # Kusto: 
             az account get-access-token --scope https://kusto.kusto.windows.net/.default --output none
  3. If your workflow fails after 60 minutes: This is because azure-cli can only request an access token with a lifetime of 60 minutes. But ID token has expired after 5 minutes, azure-cli cannot get a new access token after 60 minutes. It is expected to be solved after azure-cli supports ID token refresh. Workaround: Use user managed identities with OIDC, instead of using service principals The token lifetime of managed Identities would be 24 hours, see Managed identities tokens cache. This can cover the lifetime for most of the CI/CD workflows.
  4. If your workflow fails after 5 minutes with azure-powershell < 9.2: This is the scenario what ERROR: AADSTS700024: Client assertion is not within its valid time range #180 talks. It's fixed in Azure PowerShell v9.2 (released on 12/6/2022). See ERROR: AADSTS700024: Client assertion is not within its valid time range #180 (comment).

Check your scenario and use the provided workaround. We're actively working to resolve this issue. Thank you for your understanding.

hey, we are using 2.60.0 and still seeing:

ERROR: AADSTS700024: Client assertion is not within its valid time range.
Current time: 2024-05-14T03:56:38.3260093Z, assertion valid from
2024-05-14T03:32:44.0000000Z, expiry time of assertion
2024-05-14T03:37:44.0000000Z.

az version output:

{
  "azure-cli": "2.60.0",
  "azure-cli-core": "2.60.0",
  "azure-cli-telemetry": "1.1.0",
  "extensions": {
    "resource-graph": "2.1.0"
  }
}

any pointers?

MoChilia commented 1 month ago

Hi @4c74356b41, please review scenario 2. The Azure CLI currently does not support ID token refresh.

4c74356b41 commented 1 month ago

sorry, why scenario 2?

If your workflow fails after 5 minutes also in azure-cli <= 2.58.0:

this is not my scenario, I'm running 2.60.0

MoChilia commented 1 month ago

@4c74356b41, so it worked for you with azure-cli <= 2.58.0? Scenario 1 applies to users who encountered the issue only in azure-cli == 2.59.0.

4c74356b41 commented 1 month ago

oh, okay, no, i dont think it did. i migrated to federated identity and this started happening. i was under the assumption that 2.60.0 is supposed to fix this underlying issue?

MoChilia commented 1 month ago

@4c74356b41, not yet. Version 2.60.0 only addressed scenario 1. We're still waiting for Azure CLI to support ID token refresh for the other scenarios. Please check if the workarounds work for you.

4c74356b41 commented 1 month ago

okay, my bad then. any chance I can track this work?

MoChilia commented 1 month ago

@4c74356b41, I'll keep this issue open and update you once the improvement is ready.

mderriey commented 1 month ago

For my case, where we're only using ARM tokens, the new Azure CLI v2.60.0 baked into the new GitHub Actions runner image (ubuntu-22.04 or ubuntu-latest 20240516.1) fixed the issue, and I was able to remove the extra azure/login steps I had added in between the other steps.

Thanks! 🙏

neilmca-inc commented 1 month ago

Converted one of my Azure DevOps service connections today to test the Workflow Federated Identity scenarios and the first pipeline I ran that used Azure CLI for longer than 10 minutes failed with this error

ubuntu-latest azure-cli 2.60.0

ERROR: AADSTS700024: Client assertion is not within its valid time range. Current time: 2024-05-24T13:34:27.2438476Z, assertion valid from 2024-05-24T13:16:07.0000000Z, expiry time of assertion 2024-05-24T13:26:07.0000000Z. Review the documentation at https://docs.microsoft.com/azure/active-directory/develop/active-directory-certificate-credentials

Add this to the day long outage for Workflow Federated Identity this month https://status.dev.azure.com/_event/499193080 and I remain unconvinced by the "new" way. The draconian measure of reducing the old secret way of working down to a 3 month expiry date is all well and good PROVIDED the new way works.

I certainly cannot proceed with confidence right now - certainly not in production.

apalich commented 1 month ago

Any update on this issue. My workflow is also failing with error ClientAssertionCredential authentication failed: A configuration issue is preventing authentication - check the | error message from the server for details. You can modify the configuration in the application registration | portal. See https://aka.ms/msal-net-invalid-client for details. Original exception: AADSTS700024: Client | assertion is not within its valid time range.

4c74356b41 commented 4 weeks ago

@4c74356b41, I'll keep this issue open and update you once the improvement is ready.

any issues\milestone we can track on our own? thanks 2.61.0 is out, but i dont think it contains a fix for this ps. Also looking at 2.61.0 release docs - many breaking changes, why breaking changes without major version bump?

MoChilia commented 4 weeks ago

@4c74356b41, I'll keep this issue open and update you once the improvement is ready.

any issues\milestone we can track on our own? thanks 2.61.0 is out, but i dont think it contains a fix for this ps. Also looking at 2.61.0 release docs - many breaking changes, why breaking changes without major version bump?

Let's track the feature here: https://github.com/Azure/azure-cli/issues/28708. Currently, Azure CLI does not support ID token refresh.