Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.62k stars 2.83k forks source link

SharedTokenCacheBase not identifying cache due to irrelevant error #34624

Closed kcphila closed 7 months ago

kcphila commented 8 months ago

Describe the bug Attempting to use azure.identity.SharedTokenCacheCredential fails because no accounts are found even when valid refresh tokens exist. This is because the refresh tokens do not have family_id

To Reproduce

See script excerpt function at the bottom.

Expected behavior

azure.identity.SharedTokenCacheCredential should be able to identify a single matching refresh token in the cache without the user having to manually construct an authentication records.

Solution

This is due to the check for family_id in shared_token_cache, around line 185.

There is a comment that says "there should always be a family id." In our runtime, the tokens do not have it. This can be solved by removing the family ID check. I'd be happy to submit a PR.

Additional context

Script to demonstrate the issue. We're using azure.identity for msgraph, and so I do include it but it's just to verify authentication and fetch a record.

import sys
import traceback
import asyncio
import azure, azure.identity
import msal
import msgraph

async def run_program(client_id, tenant_id, scopes):

    # Set up Cache Persistence

    azure_cache = azure.identity.TokenCachePersistenceOptions(
            name = 'azure_bug'
        )

    # Authenticate via Interactive Browser,
    # which should create the cache file
    interactive_cred = azure.identity.InteractiveBrowserCredential(
        client_id = client_id,
        tenant_id = tenant_id,
        redirect_uri='http://localhost:8000',
        cache_persistence_options = azure_cache,
    )
    interactive_client = msgraph.GraphServiceClient(
            interactive_cred,
            scopes,
        )
    verify = await interactive_client.me.drives.get()
    assert verify

    # Set up Shared Token Cache Cred with same cache details
    shared_credential = azure.identity.SharedTokenCacheCredential(
            #client_id = client_id,
            tenant_id = tenant_id,
            # Not sure why, but it seems like authority is required
            authority = 'https://login.microsoftonline.com/' + tenant_id,
            cache_persistence_options = azure_cache,
        )

    # --- this is going to fail, but we'll use it
    # to trigger the default cache instantiation in
    # the SharedToken cred
    shared_client = msgraph.GraphServiceClient(
        shared_credential,
        CONFIG['user_scopes'],
    )
    try:
        fails = await shared_client.me.drives.get()
        raise Exception("This fails in our example")
    except:
        print("Regular Authentication fails")

    # Look under the hood - verify it should work
    refresh_tokens = shared_credential._credential._get_cache_items_for_authority(msal.TokenCache.CredentialType.REFRESH_TOKEN, False)

    all_accounts =shared_credential._credential._get_cache_items_for_authority(msal.TokenCache.CredentialType.ACCOUNT, False)

    assert refresh_tokens
    assert all_accounts
    assert len(refresh_tokens) == 1 or len(all_accounts) == 1

    #--------------------------------------------------------
    # This code is copied from and slightly modified
    # azure-sdk-for-python/sdk/identity/azure-identity/azure/identity/_internal/shared_token_cache.py
    # around line 174
    # https://github.com/Azure/azure-sdk-for-python/blob/azure-identity_1.15.0/sdk/identity/azure-identity/azure/identity/_internal/shared_token_cache.py

    found_account = None
    for refresh_token in refresh_tokens:
        home_account_id = refresh_token.get("home_account_id")
        if not home_account_id:
            continue
        for account in all_accounts:
            # When the token has no family, msal.net falls back to matching client_id,
            # which won't work for the shared cache because we don't know the IDs of
            # all contributing apps. It should be unnecessary anyway because the
            # apps should all belong to the family.
            if home_account_id == account.get("home_account_id"):# BAD CODE:  and "family_id" in refresh_token:
                if found_account:
                    raise Exception("too many matching accounts")
                found_account = account
    #--------------------------------------------------------

    assert found_account

    # now fake it working and verify the refresh token
    # would work
    manufactured_auth_record = azure.identity.AuthenticationRecord(
            tenant_id=found_account['realm'],
            client_id=refresh_token['client_id'],
            authority=refresh_token['environment'],
            home_account_id=found_account['home_account_id'],
            username=found_account['username'],
       )

    shared_credential = azure.identity.SharedTokenCacheCredential(
            tenant_id = tenant_id,
            authority = 'https://login.microsoftonline.com/' + tenant_id,
            cache_persistence_options = azure_cache,
            authentication_record = manufactured_auth_record
        )
    shared_client = msgraph.GraphServiceClient(
        shared_credential,
        CONFIG['user_scopes'],
    )
    succeeds = await shared_client.me.drives.get()
    assert succeeds
    print("Account and Refresh Token appear to match and be valid")
    pass

if __name__ == '__main__':
    client_id = # SOMETHING
    tenant_id = # SOMETHING
    scopes = # SOMETHING
    asyncio.run(run_program(client_id, tenant_id, scopes))
kashifkhan commented 8 months ago

Thank you for the feedback @kcphila . We will investigate and get back to you asap.

xiangyan99 commented 8 months ago

Thanks for the feedback. Unfortunately, today msgraph does not work with identity library.

Please open a feature request for msgraph to support identity library.

github-actions[bot] commented 8 months ago

Hi @kcphila. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

kcphila commented 8 months ago

/unresolve

Hi @xiangyan99

Thanks for the feedback. Unfortunately, today msgraph does not work with identity library.

I put a comment in the original report that the error is not in msgraph and I only used that for shorthand.

This code demonstrates the same issue without using msgraph


async def run_program(client_id, tenant_id, scopes):

    # Set up Cache Persistence

    azure_cache = azure.identity.TokenCachePersistenceOptions(
            name = 'azure_bug'
        )

    # Authenticate via Interactive Browser,
    # which should create the cache file
    interactive_cred = azure.identity.InteractiveBrowserCredential(
        client_id = client_id,
        tenant_id = tenant_id,
        redirect_uri='http://localhost:8000',
        cache_persistence_options = azure_cache,
    )
    verify = interactive_cred.get_token(*scopes)
    assert verify

    # Set up Shared Token Cache Cred with same cache details
    failing_credential = azure.identity.SharedTokenCacheCredential(
            #client_id = client_id,
            tenant_id = tenant_id,
            # Not sure why, but it seems like authority is required
            authority = 'https://login.microsoftonline.com/' + tenant_id,
            cache_persistence_options = azure_cache,
        )

    # --- this is going to fail, but we'll use it
    # to trigger the default cache instantiation in
    # the SharedToken cred
    try:
        fails = failing_credential.get_token(*scopes)
        print("If you read this, MS is authenticating your application differently than other accounts")
    except:
        print("Regular Authentication fails")

    # Look under the hood - verify it should work
    refresh_tokens = failing_credential._credential._get_cache_items_for_authority(msal.TokenCache.CredentialType.REFRESH_TOKEN, False)

    all_accounts =failing_credential._credential._get_cache_items_for_authority(msal.TokenCache.CredentialType.ACCOUNT, False)

    assert refresh_tokens
    assert all_accounts
    assert len(refresh_tokens) == 1 or len(all_accounts) == 1

    #--------------------------------------------------------
    # This code is copied from and slightly modified
    # azure-sdk-for-python/sdk/identity/azure-identity/azure/identity/_internal/shared_token_cache.py
    # around line 174
    # https://github.com/Azure/azure-sdk-for-python/blob/azure-identity_1.15.0/sdk/identity/azure-identity/azure/identity/_internal/shared_token_cache.py

    found_account = None
    for refresh_token in refresh_tokens:
        home_account_id = refresh_token.get("home_account_id")
        if not home_account_id:
            continue
        for account in all_accounts:
            # When the token has no family, msal.net falls back to matching client_id,
            # which won't work for the shared cache because we don't know the IDs of
            # all contributing apps. It should be unnecessary anyway because the
            # apps should all belong to the family.
            if home_account_id == account.get("home_account_id"):# BAD CODE:  and "family_id" in refresh_token:
                if found_account:
                    raise Exception("too many matching accounts")
                found_account = account
    #--------------------------------------------------------

    assert found_account
    # now fake it working and verify the refresh token
    # would work
    manufactured_auth_record = azure.identity.AuthenticationRecord(
            tenant_id=found_account['realm'],
            client_id=refresh_token['client_id'],
            authority=refresh_token['environment'],
            home_account_id=found_account['home_account_id'],
            username=found_account['username'],
       )

    shared_credential = azure.identity.SharedTokenCacheCredential(
            tenant_id = tenant_id,
            authority = 'https://login.microsoftonline.com/' + tenant_id,
            cache_persistence_options = azure_cache,
            authentication_record = manufactured_auth_record
        )
    succeeds = shared_credential.get_token(*scopes)
    assert succeeds
    print("Account and Refresh Token appear to match and be valid")
    pass
xiangyan99 commented 8 months ago

Thanks for the updates.

Could you share what you try to achieve? Can it be done without SharedTokenCacheCredential?

If you want to cache/share the token, you can use this:

https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/identity/azure-identity/samples/user_authentication.py

github-actions[bot] commented 8 months ago

Hi @kcphila. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

kcphila commented 8 months ago

Hi @xiangyan99 ,

What we're really trying to do is set up the cache_persistence_option to work between subsequent calls to different scripts (or even the same script) - which seems to be what the shared token credential is designed for.

We have a set of local scripts (in the same repo) that we run from the same computer and want them to share the access/refresh tokens for authentication. The procedure in your link with SecretClient wouldn't work for us because we don't use an off site location like the Azure vault to save tokens as it introduces a new security threat. We could accomplish the needs by writing the tokens to a local file as well, but since this is already built into the cache_persistence_option, we'd prefer to use that.

I know I wrote a lot of code, but just to be clear that the issue is in line 185 of shared_token_cache.py

It has a check if family_id is a key in the refresh_token - but it doesn't actually even check the value of family_id - and the tokens our apps are receiving from Azure don't have a family_id field. The solution would be to change this line to just do the account check, removing the and "family_id" in refresh_token . I have searched the repo for other references to the family_id field, and it doesn't appear to be actually used anywhere in the repo at all aside from a few autgenerated comments - so I don't know if this is perhaps a change in how Microsoft's tokens are encoded that just didn't get updated or if this is a difference based on the type of application or organization, but it sure doesn't seem to be important.

for account in all_accounts:
    # When the token has no family, msal.net falls back to matching client_id,
    # which won't work for the shared cache because we don't know the IDs of
    # all contributing apps. It should be unnecessary anyway because the
    # apps should all belong to the family.
    if home_account_id == account.get("home_account_id") and "family_id" in refresh_token:
        accounts[account["home_account_id"]] = account

So, the corrected code would be :

    if home_account_id == account.get("home_account_id"):
        accounts[account["home_account_id"]] = account
xiangyan99 commented 8 months ago

Thanks for the information.

SharedTokenCacheCredential is not design for this purpose. It is designed for sharing token from 1st party app like Visual studio.

As you can find in my shared sample, you can use

To isolate the token cache from other applications, you can provide a cache name to TokenCachePersistenceOptions. separate_cache_credential = InteractiveBrowserCredential( cache_persistence_options=TokenCachePersistenceOptions(name="my_app"), authentication_record=deserialized_record )

This request should prompt for authentication since the credential is using a separate cache. client = SecretClient(VAULT_URL, separate_cache_credential)

github-actions[bot] commented 8 months ago

Hi @kcphila. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue.

github-actions[bot] commented 7 months ago

Hi @kcphila, we're sending this friendly reminder because we haven't heard back from you in 7 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!