Open weikanglim opened 1 year ago
I'm starting to suspect that applications should not react to partitionKey
unless used specifically for on-behalf-of scenarios?
The partitioned manager in MSAL seems to be only component that caches by key in OBO, but the default storage manager just assumes 1:1.
Our CLI application only supports a 1:1 relationship between AD user and OS user, storing the token cache under a user-specific directory, so perhaps ignoring partitionKey
is the right thing to do?
RE: How does multitenancy play in to this, if at all?
This is an interesting issue, thanks for opening it. I believe the old behavior--suggesting a partition key for OBO data only--was incorrect and that azd can ignore the suggestion in any case.
Applications decide whether to partition an external cache. MSAL suggests a key but has no opinion on how applications use it. Partitioning is important for OBO data because the application doesn't get user account information in that flow, however this isn't the only scenario in which it makes sense to partition the cache. For example, a web app may distribute cache data across some number of databases.
My naive belief is that azd, being a CLI application, has nothing to gain by partitioning its cache and can therefore just ignore the suggested key. It isn't necessarily harmful to partition the cache, however I imagine a change in partitioning between azd versions would probably force users to reauthenticate after upgrading. I don't see a connection between partitioning and the AAD error you shared though, so perhaps I'm missing something important about azd's cache.
RE: How does multitenancy play in to this, if at all?
Can you elaborate some more? I don't see a connection.
- Is it expected that AcquireTokenSilent with the same publicClient and account results in Replace() being called with a different partition key?
No; is that happening? I expect the key to be the user's home account ID.
Partitioning is important for OBO data because the application doesn't get user account information in that flow, however this isn't the only scenario in which it makes sense to partition the cache. For example, a web app may distribute cache data across some number of databases.
That makes sense.
I don't see a connection between partitioning and the AAD error you shared though, so perhaps I'm missing something important about azd's cache.
The connection here, based on my observations are:
1 Replace(): called from github.com/AzureAD/microsoft-authentication-library-for-go/apps/internal/base.Client.AuthResultFromToken
2 Export(): called from github.com/AzureAD/microsoft-authentication-library-for-go/apps/internal/base.Client.AuthResultFromToken
3 Replace(): called from github.com/AzureAD/microsoft-authentication-library-for-go/apps/internal/base.Client.AllAccounts
4 Replace(43bb7435-f8c4-4342-9788-fd15e454ea12.72f988bf-86f1-41af-91ab-2d7cd011db47): called from github.com/AzureAD/microsoft-authentication-library-for-go/apps/internal/base.Client.AcquireTokenSilent
Which results in the following behavior:
AcquireTokenInteractive
, which populates a cache with lookup key = ""
.AcquireTokenSilent
, which is what azd relies on to not prompt the user in-between CLI commands. It reads from the cache with lookup key = "{homeAccountId}"
. This was never updated in the login flow, and is potentially stale.This used to work by coincidence, simply because Step 4 was previously a call to Replace("")
and not Replace("{homeAccountId}")
.
Is it expected that AcquireTokenSilent with the same publicClient and account results in Replace() being called with a different partition key? No; is that happening? I expect the key to be the user's home account ID.
Incorrect statement on my part. In the linked code example, it is AcquireTokenInteractive
, providing no partition key hints, but AcquireTokenSilent
does. I'm definitely not confident to suggest anything wrong here, just curious what's the expected behavior.
AcquireTokenInteractive
, providing no partition key hints, butAcquireTokenSilent
does.
Thanks for pointing this out. Both methods should suggest the same key (#424). I suppose azd might work as before if they did, however ignoring partition keys still looks like the best solution and I imagine would be faster to implement than waiting for an MSAL patch. What do you think?
@chlowell Absolutely. Sorry, was late on this update, and forgot to highlight it earlier --we had went ahead with the change to remove the logic that treated partitioning as the lookup key.
On a side note: Is cache.ExportReplace
the stable interface? I can submit a issue / PR to try to maybe update the interface documentation to contain some of the considerations an implementor might think about based on our learnings.
Great, I'm glad azd is unblocked.
The v1.0.0 API including cache.ExportReplace
is stable, please feel free to contribute doc updates.
@weikanglim could you update the docs with your findings? I am facing the same issue and solved it with hardcoding the key and ignoring the suggested key, what was your approach?
@phanirithvij Sorry, haven't come around to this. As @chlowell suggests above:
Applications decide whether to partition an external cache. MSAL suggests a key but has no opinion on how applications use it. Partitioning is important for OBO data because the application doesn't get user account information in that flow, however this isn't the only scenario in which it makes sense to partition the cache. For example, a web app may distribute cache data across some number of databases.
The key is meant as a suggested partitioning key for an external cache. In our CLI, using it as a lookup key for the user was incorrect, and we modified the logic to avoid using the key in this manner (change here).
Which version of MSAL Go are you using? Microsoft Authentication Library for Go 1.0.0-preview
Where is the issue?
Is this a new or an existing app? a. The app is in production and I have upgraded to a new version of Microsoft Authentication Library for Go.
What version of Go are you using (
go version
)?What operating system and processor architecture are you using (
go env
)? Observed reproducing on all OSes and amd64 + arm64.Repro
In production, it affects
cache.ExportReplace
implementation in Azure/azure-dev: https://github.com/Azure/azure-dev/blob/main/cli/azd/pkg/auth/cache.goMinimal repro:
main
. Rungit checkout old
to observe MSAL 0.81 behavior.go run main.go
Expected behavior
In old version, 0.81 of msal-go, we observe that calling
AcquireTokenInteractive
(line 1-2), followed byAcquireTokenSilent
results in the firstReplace()
call (line 5) frombase.Client.AcquireTokenSilent
withkey = ''
:Actual behavior In new version, 1.0 of msal-go, we observe that calling
AcquireTokenInteractive
(line 1-2), followed byAcquireTokenSilent
results in the firstReplace()
call (line 4) frombase.Client.AcquireTokenSilent
withkey = '<homeAccountId>'
:Possible solution
The production app, azd, currently stores the MSAL cache data on disk based on the key. Our current logic looks like:
publicCient.AcquireTokenInteractive
- Acquire token.Export()
will be called and the data will be stored on diskpublicClient.AcquireTokenSilent
-Replace()
should be called and read from the stored data on diskHowever, because
Replace(<homeId>)
is being called instead ofReplace()
, we are seeing customer reports of our login flow failing immediately becauseAcquireTokenSilent
is reading from a stale cache, with the following HTTP error observed:The behavioral change is explained by this change in base.go, where a subtle re-sequencing of
authParams.AuthorizationType = authority.ATRefreshToken
is now before theReplace()
call, andauthParams.CacheKey
reacts differently due to this line herelmnow thatAuthorizationType
is set toauthority.ATRefreshToken
Questions:
ReplaceHints.PartitionKey
(previously calledkey
)?AcquireTokenSilent
with the samepublicClient
and account results inReplace()
being called with a different partition key?Additional context / logs / screenshots Add any other context about the problem here, such as logs and screenshots.