AzureAD / microsoft-authentication-library-for-python

Microsoft Authentication Library (MSAL) for Python makes it easy to authenticate to Microsoft Entra ID. General docs are available here https://learn.microsoft.com/entra/msal/python/ Stable APIs are documented here https://msal-python.readthedocs.io. Questions can be asked on www.stackoverflow.com with tag "msal" + "python".
https://stackoverflow.com/questions/tagged/azure-ad-msal+python
Other
770 stars 192 forks source link

Access token not being used from the cache with both `Files.ReadWrite` and `User.ReadBasic.All` scopes. #450

Closed cp2004 closed 2 years ago

cp2004 commented 2 years ago

Describe the bug I have an implementation accessing MS Graph using MSAL for Python here: OctoPrint OneDrive Backup - provided for reference in case it is needed.

For MS Graph requests I use acquire_token_silent_with_error to get the token & use it in the headers of each request. If I made lots of requests in a row, I noticed that it began erroring and not working anymore. Some investigation followed...

When I called acquire_token_silent_with_error and stepped through it with the debugger, it was showing that it never used the access token from the cache. It was always using the refresh token to get a new token. My scopes I was providing as a list are "User.ReadBasic.All","Files.ReadWrite". In the access token cache, I can see that it is writing:

"AccessToken": {
        "00000000-0000-0000-[REDACTED]-login.microsoftonline.com-accesstoken-1fbab959-f7f1-43c4-a800-5f7f58eb068f-common-files.readwrite openid profile": {
            "credential_type": "AccessToken",
            "secret": "[REDACTED]",
            "home_account_id": "[REDACTED]",
            "environment": "login.microsoftonline.com",
            "client_id": "1fbab959-f7f1-43c4-a800-5f7f58eb068f",
            "target": "Files.ReadWrite openid profile",
            "realm": "common",
            "token_type": "Bearer",
            "cached_at": "1640808717",
            "expires_on": "1640812317",
            "extended_expires_on": "1640812317"
        }
    },

Where the target line is Files.ReadWrite openid profile. In the cache lookup, it is converting this to a set and comparing it to my scopes. They don't match, so it never returns this AT from the cache.

https://github.com/AzureAD/microsoft-authentication-library-for-python/blob/a18c2231896d8a050ad181461928f4dbd818049f/msal/token_cache.py#L96

target_set is {"Files.ReadWrite", "User.ReadBasic.all"}, while the comparison is with {"Files.ReadWrite", "openid", "profile"} - and they don't match.

So I removed the User.ReadBasic.all from my scopes list, and I get the same permissions, everything works OK - the stored AT in the cache is now used when requested and I still have access to user data (I think). Not entirely sure what to make of it so I thought I'd document it in case there was something to be done. So my issue is solved, I don't mind if there's nothing to be done here because I misunderstood something.

To Reproduce Steps to reproduce the behavior:

  1. Using MSAL device flow login, with the code.
  2. Using scopes ["Files.ReadWrite", "User.ReadBasic.all"], and using a persistent cache.
  3. Using acquire_token_silent_with_error, and stepping through with a debugger to figure out what was wrong with the access token not being used from the cache.

Expected behavior The access token would be correctly selected from the cache.

What you see instead See above, I think enough is explained.

The MSAL Python version you are using 1.16.0

Additional context None

rayluo commented 2 years ago

... Not entirely sure what to make of it so I thought I'd document it in case there was something to be done. So my issue is solved, I don't mind if there's nothing to be done here because I misunderstood something.

Thank you @cp2004 for investing your time on investigating and describing the behavior. We love to have this kind of in-depth engagement with the community members!

When I called acquire_token_silent_with_error and stepped through it with the debugger, it was showing that it never used the access token from the cache. It was always using the refresh token to get a new token. My scopes I was providing as a list are "User.ReadBasic.All","Files.ReadWrite". In the access token cache, I can see that it is writing:

"AccessToken": {
        "00000000-0000-0000-[REDACTED]-login.microsoftonline.com-accesstoken-1fbab959-f7f1-43c4-a800-5f7f58eb068f-common-files.readwrite openid profile": {
            "credential_type": "AccessToken",
            ...,
            "target": "Files.ReadWrite openid profile",
            ...,
        }
    },

Where the target line is Files.ReadWrite openid profile. In the cache lookup, it is converting this to a set and comparing it to my scopes. They don't match, so it never returns this AT from the cache.

In this particular case, what you observed was actually an expected behavior.

  1. In your repro step 1, you were using device flow. But the following behavior can be observed regardless of what acquire_token_by_xyz() method you use.

  2. In your repro step 2, you were essentially applying an access token (AT) for two scopes, say, "foo" and "bar". The underlying OAuth2 specs allows the identity server to selectively issue an AT for one scope "foo" but not the other "bar", and this would be conveyed by a successful response which contains an extra field scope which contains what scope(s) this AT is good for. That scope field will be absent when and only when the returned AT covers all scopes that the app was applying for.

    MSAL Python was designed to follow that OAuth2 convention (More on this later), and the app developer is expected to examine the return value of the acquire_token_by_xyz() and decide whether to proceed when/if the returned AT contains only partial scope(s).

  3. In step 3, your app was calling acquire_token_silent(scopes=["foo", "bar"]) again. MSAL Python unsurprisingly found no AT in its cache to satisfy both scopes, so, MSAL Python falls back to (use a refresh token to) ask AAD "hey I can't find an AT from my cache, would you please decide whether you would like to fulfill this request?". AAD would probably repeat its behavior in step 2, i.e. to issue a new AT but still without scope "bar". So, your app still seemingly gets an AT, but it is not really utilizing MSAL's token cache mechanism. That is also why you would subsequently observe this:

If I made lots of requests in a row, I noticed that it began erroring and not working anymore.

Your app was throttled by AAD when it detects that you send out lots of identical request without properly utilizing a token cache.

So I removed the User.ReadBasic.all from my scopes list, and I get the same permissions, everything works OK - the stored AT in the cache is now used when requested and I still have access to user data (I think). Not entirely sure what to make of it ...

That is indeed the right fix for your app. Since that "bar" scope was never really contained in the ATs that you obtained, it is safe to say that your app does not really need that scope to function.

P.S.: While MSAL Python's current behavior is not wrong, it is probably also fair to say not many app developers know that they are supposed to examine the optional scope field in each acquire_token_by_xyz()'s return value, and they might run into the same issue that @cp2004 described here. Perhaps MSAL Python will consider emit a warning log when the returned scopes do not fully match the requested scopes.

cp2004 commented 2 years ago

Thanks for getting back to me with an explanation - it's helped a lot to clear up some of my misunderstanding of the way it works. I did start using MSAL without much knowledge of the way OAuth2 works, it made it way way easier to figure out tokens etc, I love having this library implemented.

MSAL Python was designed to follow that OAuth2 convention (More on this later), and the app developer is expected to examine the return value of the acquire_token_by_xyz() and decide whether to proceed when/if the returned AT contains only partial scope(s).

Understood, this makes sense if that's the protocol, but I didn't know that was a possibility. I like the idea of emitting a warning or some other way (maybe in the docs) to say check the scopes you received are right.

That is indeed the right fix for your app. Since that "bar" scope was never really contained in the ATs that you obtained, it is safe to say that your app does not really need that scope to function.

I'm quite happy I was able to get to the bottom of it in the end 🙂