Documentation and Cache is Confusing

bpossolo commented 1 year ago

Core Library

MSAL.js v2 (@azure/msal-browser), MSAL Node (@azure/msal-node)

Wrapper Library

MSAL Angular (@azure/msal-angular), MSAL React (@azure/msal-react), MSAL Node Extensions (@azure/msal-node-extensions)

Public or Confidential Client?

Public, Confidential

Documentation Location

https://github.com/AzureAD/microsoft-authentication-library-for-js/blob/dev/lib/msal-node/docs/caching.md

Description

I've spun this issue off of an issue another person raised regarding confusion around library usage particularly the caching concepts. I figured I should create a new issue since the other one is already closed and unlikely to be resurfaced.

I've been using msal for java/javascript/node for some time now and have experienced confusion around how to properly use the family of MSAL libraries. I do like that the js/java/node sdks all have similar api/concepts/usage-pattern (makes it easy to work on apps using the various libs) but below are some issues/suggestions for them all as a whole.

Some of the ways that it's confusing

it abstracts away refresh tokens behind a convoluted token cache concept
the api/method names are strange and make it hard to understand how to use the library. for example, acquireTokenSilently would be better named something like:
```
function getAccessTokenFromCache(options: Options): string { ... }
```

interface Options { refreshTokenIfExpired: boolean; }

// this makes it clear where the token is coming from (the cache) and that // setting the option to true may result in a network request


- the concept of accounts has nothing to do with oauth spec and requires quite a bit of research to understand what it represents and how to use them to access tokens via the msal sdk
- the sdk hides refresh tokens behind account ids for "security reasons" even though the app developer can very easily call the oauth endpoints directly and get the id/access/refresh tokens directly
- in order to perform most sdk operations, the sdk-user must provide the account id which means the account id must be stored in a user-bound session. for server rendered web apps, the user-bound session must be stored in a cookie. for single-page web apps, the session identifier is stored in local storage right next to the other tokens. this isn't immediately obvious when you start using the sdk and, in the end, it feels like security through obfuscation. I realize that keeping the refresh token on the server is good practice when possible since a leaked refresh token can be abused whereas user web sessions can be easily invalidated.... however, this concept goes out the window for single page apps. there are a lot of mental hurdles to understand this model when picking up the sdk.
- the token cache is confusing. it's unclear if each entry in the cache is a distinct thing like an account, an id token, an access token, a refresh token, etc.... or if an entry represents a mapping from an account id to the respective bundle of tokens (id/access/refresh tokens) for that account.
- it's unclear if a serialized token cache contains everything for all accounts... or if a serialized token cache is a blob for one account. in other words, I don't understand when persisting this blob if it should be stored in a mysql table record for one user... or if the blob is a giant thing that must be serialized/deserialized any time a user anywhere logs in/out.
- sdk user must implement the `DistributedCachePlugin.ts` in all but the most trivial apps. this is kind of a big implementation detail that surprises developers after they've started to use the sdk
- the `IPartitionManager.ts` api doesn't make sense to me
  - it has two methods which are named almost the same thing.
  - the documentation doesn't say anything about key collisions, what will happen, how to avoid them, etc when coming up with an implementation
  - the `DistributedCachePlugin` (and associated `IPartitionManager`) must be specified when creating the confidential client. I assume the confidential client is intended to be created once and used as a singleton, yet the `IPartitionManager.getKey()` method takes no arguments... this suggests it's meant to be used on a per-request fashion. How is the partition manager supposed to generate a partition key for an account given it takes no inputs/arguments? I'd expect this interface to have a single method like `getPartitionKey(accountId: string): Promise<string> | string` which would be called when serializing _and_ deserializing entries.
- the methods on `IPartitionManger` all return a promise of a string. it would be nice if they could return a simple string too.
- the home account id versus local account id concept makes little sense and is a plumbing detail that probably shouldn't be exposed to sdk users.

### Some suggestions for improvements
- it could offer a "core" sdk which exposes the rudimentary oauth operations (i.e. authorization code grant url, code-to-token exchange which returns all tokens from the identity provider, refresh access token using refresh token, etc). this would leave management of the id/access/refresh tokens to the sdk-user
- the caching layer/logic could be a separate package on top of the core lib
- it would be good if scopes could be specified when creating the public/confidential client. the `getAuthorizationUrl()` and `acquireTokenSilently()` functions are usually called in completely different parts of an app codebase. the sdk could include  scopes specified when creating the client with the calls to the various methods rather than requiring the sdk-user to specify them every time one of the methods is called. after a user is authenticated, the `acquireTokenSilently` method will be called a lot from various places. it should be as easy as possible to fetch a token from the sdk after authentication.
- sometimes an sdk-user wants to do something simple like check "is user authenticated" to render some ui element. there is no obvious way to do this via the sdk api. we need to call `acquireTokenSilently()` and wrap it in a `try/catch` block simply to determine if a user's access token is still valid and thereby "authenticated". it would be great if the sdk offered an obvious/clear way to do a simple `isAuthenticated(accountId)` check.
- most single-page apps only support one authenticated user at a time. the sdk forces the multi-account concept onto apps that don't need it. this functionality makes most sense for server-based web apps where the confidential client is used to service requests coming from many users simultaneously.
- the documentation for `acquireTokenSilently` should state what guarantees one can expect from a token returned by the method (for example, the token is guaranteed to be valid and not expired). this is important information to convey to the sdk user given that it's built on top of a cache

jo-arroyo commented 1 year ago

@bpossolo Thanks for your detailed feedback. Have you taken a look at any of the PRs linked on the previous issue to see whether those documentation upgrades addressed your concerns?

@sameerag @bgavrilMS @derisen for caching and documentation concerns, @EmLauber for visibility

bpossolo commented 1 year ago

@jo-arroyo yes I've looked through the repo samples but had to do a lot of digging to understand how the sdk is meant to be used. the web-app with distributed cache one turned out to be the real eye-opener which showed me that a new msal instance must be created per request (rather than as a singleton). this was one of the things which was most counter-intuitive.

EmLauber commented 1 year ago

Thanks @bpossolo for the feedback. There are a lot of different elements called out, some at a documentation level and others at how we abstract things for the sdk-user. We are reviewing this feedback and will consider it on how we can improve in future versions. You are welcome to also make PRs directly on the documentation of elements that were confusing.

@bgavrilMS to comment if there was anything specific you wanted to add about the confidential client or cache related feedback.

derisen commented 1 year ago

@bpossolo this is excellent feedback, thank you very much for taking the time! I'll try to respond to some of them below:

Some of the ways that it's confusing

it abstracts away refresh tokens behind a convoluted token cache concept

the api/method names are strange and make it hard to understand how to use the library. for example, acquireTokenSilently would be better named something like:
function getAccessTokenFromCache(options: Options): string { ... }

interface Options {
  refreshTokenIfExpired: boolean;
}

// this makes it clear where the token is coming from (the cache) and that
// setting the option to true may result in a network request

As you know, there are only 2 hard things in CS, and naming is one of them : ) In defence, acquireTokenSilent was named as such, since it retrieves a token either from the cache or from network via the refresh token. Both of these happen without a prompt, so the assumption is that it won't matter to the consumer which way the token is obtained and returned.

the concept of accounts has nothing to do with oauth spec and requires quite a bit of research to understand what it represents and how to use them to access tokens via the msal sdk

This is fair criticism i.e. there could certainly be more guidance on the significance of this concept and usage. Before MSAL, ADAL had the concept of user instead, but this was discarded, partly because a user can have more than one account in a tenant.

the sdk hides refresh tokens behind account ids for "security reasons" even though the app developer can very easily call the oauth endpoints directly and get the id/access/refresh tokens directly

True. MSAL is only a convenience library -security is not the whole story with respect to the decision to not expose refresh tokens. I lack the full context for it, but @sameerag or @bgavrilMS could elaborate more on it.

in order to perform most sdk operations, the sdk-user must provide the account id which means the account id must be stored in a user-bound session. for server rendered web apps, the user-bound session must be stored in a cookie. for single-page web apps, the session identifier is stored in local storage right next to the other tokens. this isn't immediately obvious when you start using the sdk and, in the end, it feels like security through obfuscation. I realize that keeping the refresh token on the server is good practice when possible since a leaked refresh token can be abused whereas user web sessions can be easily invalidated.... however, this concept goes out the window for single page apps. there are a lot of mental hurdles to understand this model when picking up the sdk.

the token cache is confusing. it's unclear if each entry in the cache is a distinct thing like an account, an id token, an access token, a refresh token, etc.... or if an entry represents a mapping from an account id to the respective bundle of tokens (id/access/refresh tokens) for that account.

The token cache schema is something unified across all MSALs, which means it had to accommodate a great deal of use cases, platforms and etc. Ideally the token cache schema should not matter to the consumer -if MSAL APIs do not cover all needs here so that the consumer feels the need to interact with the cache directly, that is likely an issue for the SDK.

it's unclear if a serialized token cache contains everything for all accounts... or if a serialized token cache is a blob for one account. in other words, I don't understand when persisting this blob if it should be stored in a mysql table record for one user... or if the blob is a giant thing that must be serialized/deserialized any time a user anywhere logs in/out.

I understand the confusion here. I was hoping the updated caching doc would make this more clear, but I'll take another look to see if this could be highlighted further. Please also feel free to make a PR.

sdk user must implement the DistributedCachePlugin.ts in all but the most trivial apps. this is kind of a big implementation detail that surprises developers after they've started to use the sdk

the IPartitionManager.ts api doesn't make sense to me

it has two methods which are named almost the same thing.

the documentation doesn't say anything about key collisions, what will happen, how to avoid them, etc when coming up with an implementation

This is a good point. I'm noting it down and will look into clarifying it further.

the DistributedCachePlugin (and associated IPartitionManager) must be specified when creating the confidential client. I assume the confidential client is intended to be created once and used as a singleton, yet the IPartitionManager.getKey() method takes no arguments... this suggests it's meant to be used on a per-request fashion. How is the partition manager supposed to generate a partition key for an account given it takes no inputs/arguments? I'd expect this interface to have a single method like getPartitionKey(accountId: string): Promise<string> | string which would be called when serializing and deserializing entries.

the methods on IPartitionManger all return a promise of a string. it would be nice if they could return a simple string too.

To be clear, partition manager was not meant to genearate a partition key, but instead retrieve a partition key from the web app's session as the likely location for the partition key. Still, this criticism is not without merit -we are in the process of revamping distributed cache support and will surely take this into account cc @bgavrilMS @sameerag

the home account id versus local account id concept makes little sense and is a plumbing detail that probably shouldn't be exposed to sdk users.

Some suggestions for improvements

it could offer a "core" sdk which exposes the rudimentary oauth operations (i.e. authorization code grant url, code-to-token exchange which returns all tokens from the identity provider, refresh access token using refresh token, etc). this would leave management of the id/access/refresh tokens to the sdk-user

the caching layer/logic could be a separate package on top of the core lib

it would be good if scopes could be specified when creating the public/confidential client. the getAuthorizationUrl() and acquireTokenSilently() functions are usually called in completely different parts of an app codebase. the sdk could include scopes specified when creating the client with the calls to the various methods rather than requiring the sdk-user to specify them every time one of the methods is called. after a user is authenticated, the acquireTokenSilently method will be called a lot from various places. it should be as easy as possible to fetch a token from the sdk after authentication.

This only applies to msal-node at the moment, but agreed, 2 legged auth process is less than ideal. This was simplified for desktop/console apps via acquireTokenInteractive API, but web scenarios still lack a convenient API.

sometimes an sdk-user wants to do something simple like check "is user authenticated" to render some ui element. there is no obvious way to do this via the sdk api. we need to call acquireTokenSilently() and wrap it in a try/catch block simply to determine if a user's access token is still valid and thereby "authenticated". it would be great if the sdk offered an obvious/clear way to do a simple isAuthenticated(accountId) check.

This is tricky. Our guidance is around the existence of an account object (which has no concept of expiry and which should contain everything related to maintaining an authenticated UI) in the cache as the indicator of being "authenticated". You can use the various getAccountBy* APIs to retrieve this object. This is authentication state at the application level, and it does not guarantee that the user still has an active session with Azure AD (authentication state at the IdP level). But neither calling the acquireTokenSilent does -can you elaborate more on why you would want this?

most single-page apps only support one authenticated user at a time. the sdk forces the multi-account concept onto apps that don't need it. this functionality makes most sense for server-based web apps where the confidential client is used to service requests coming from many users simultaneously.

That is mostly true, but consider also the case where a single user has multiple accounts.

the documentation for acquireTokenSilently should state what guarantees one can expect from a token returned by the method (for example, the token is guaranteed to be valid and not expired). this is important information to convey to the sdk user given that it's built on top of a cache

Fair point. acquireTokenSilent is guaranteed to return a valid access token or throw, but I'll revisit our documentation to make this crystal clear.

Again, really appreciate the detailed feedback.

jihu commented 1 year ago

Hi,

I have a follow up question to this. I hope this is the right place.

I'm reading this documentation part about the Cache Lookup Policy: https://github.com/AzureAD/microsoft-authentication-library-for-js/blob/dev/lib/msal-browser/docs/token-lifetimes.md#cache-lookup-policy

There it says that the default policy (CacheLookupPolicy.Default) will "attempt to retrieve an access token from the cache. If the access token is expired or cannot be found the refresh token will be used to acquire a new one".

But we have no use of the accessToken. We are only interested in the idToken. We will not make any api calls needing an accessToken, and we don't define any scopes. This means that the Azure AD B2C won't issue any accessToken for us. But it will still issue an idToken.

As far as I understand it, this means that the cache lookup policy will fail since it won't find an accessToken, and will do the network request (to the "token" endpoint in B2C) unnecessarily. And it does this quite frequently, like once every minut.

Why does it focus on the accessToken? Is there an alternative cache available in MSAL that works when we only want the idToken, and has no accessToken (because of no defined scopes)? Otherwise, how can we make MSAL refresh the idToken in the background (ie silently) when needed?

Edit: Just a quick follow up for our specific problem mentioned above, maybe others will find it useful: An acceptable workaround for us was to set the scope to the client id (https://learn.microsoft.com/en-us/azure/active-directory-b2c/access-tokens#openid-connect-scopes). This will cause B2C to issue an accessToken, which in turn will make the MSAL token cache work. So even though we don't actually use the accessToken ourselves, this workaround removed the problem, and I don't see a downside to this. It would still be nice if the MSAL library would handle this use case (ie no custom scope and no accessToken when using acquireTokenSilently), but it's not a blocker for us anymore.

bgavrilMS commented 1 year ago

@localden and @jmprieur for the ID Token only scenario ^^

@jihu - if all you need is to login users via the Id Token, I recommend you use the middleware associated with your web site (e.g. Express) to handle this. MSAL is focused on managing access tokens. What web site technology do you use? We probably miss a sample.

We do indeed recommend folks to create a fake web api and obtain access tokens for scenarios like yours. The downside is that the expiration of the access token isn't always the same as the expiration of the id token and MSAL reports only on the access token. So you might be trusting an Id Token that is expired, although this isn't really a security problem, because the access token is still valid.

bgavrilMS commented 1 year ago

@derisen - are there any docs that we need to update for caching based on your long reply?

localden commented 1 year ago

This is really good feedback - thank you @bpossolo. We'll work with the engineering leads and PMs to see how we can slice and dice the many problems you called out.

AzureAD / microsoft-authentication-library-for-js