AzureAD / microsoft-authentication-library-for-dotnet

Microsoft Authentication Library (MSAL) for .NET
https://aka.ms/msal-net
MIT License
1.37k stars 338 forks source link

Improve CA Error Handling #1148

Closed henrik-me closed 5 years ago

henrik-me commented 5 years ago

Is your feature request related to a problem? Please describe.

Problem statement

One of common status codes returned from authentication libraries in silent mode today is InvalidGrant. This status code means that the application should call the authentication library again, but in interactive mode. Additional user interaction is required before authentication token can be issued. Note: some versions of libraries also return InteractionRequired status, with exactly the same semantics. The rest of the document will refer to InvalidGrant only, and same decisions apply to InteractionRequired.

Over time, many conditions accumulated in the broad category covered by InvalidGrant status code. As a result, InvalidGrant has become very general, and it is hard for applications to build user experience that correctly handles all the conditions that result in InvalidGrant. Some of those conditions are easy for users to resolve (e.g. accept Terms of Use with a single click), and some cannot be resolved with the current configuration (e.g. the machine in question needs to connect to a specific corporate network).

Depending on how complicated and involved required user interaction is, apps may want to show different user experience for different levels of difficulty of user interaction. For example, if the app is trying to show multiple resources to the user at the same time (e.g. items in a collection returned by a search result), the app may choose to not display specific results for which authentication flow to resolve InvalidGrant condition is too intrusive, but may choose to enable resolution of InvalidGrant condition if resolution is quick and simple.

Applications today do not have a way to distinguish between different classes of conditions that cause InvalidGrant, and therefore only can have a very generic way to handle this state, which leads to end user confusion and user experience dead ends in some of application flows.

For more details and description of scenarios, please see Improving CA Error Experience in Office.pptx

Requirements

  1. Apps must be able to distinguish between several classes of InvalidGrant condition. See "API Changes" section for the detailed list.
  2. Decouple complexity of various error conditions on the server from client apps. Server must retain capability to quickly iterate and add new authentication flows and conditions, without requiring client apps to be changed in any way.
  3. Extensibility: provide a way for future classes of InvalidGrant condition to be communicated to apps without the need to service authentication libraries.

Solution

API Changes

MSALs will expose additional classification of InvalidGrant condition. This classification will be returned as a string, with the following meaning and recommended handling:

Classification Meaning Recommended handling
basic_action Condition can be resolved by user interaction during the interactive authentication flow. Call AcquireTokenInteractively().
additional_action Condition can be resolved by additional remedial interaction with the system, outside of the interactive authentication flow. Call AcquireTokenInteractively() to show a message that explains the remedial action. Calling application may choose to hide flows that require additional_action if the user is unlikely to complete the remedial action.
message_only Condition cannot be resolved at this time. Launching interactive authentication flow will show a message explaining the condition. Call AcquireTokenInteractively() to show a message that explains the condition. AcquireTokenInteractively() will return UserCanceled error after the user reads the message and closes the window. Calling application may choose to hide flows that result in message_only if the user is unlikely to benefit from the message.
consent_required User consent is missing, or has been revoked. Call AcquireTokenInteractively() for user to give consent.
user_password_expired User's password has expired. Call AcquireTokenInteractively() so that user can reset their password.
[empty string] Condition may be resolved by user interaction during the interactive authentication flow. Call AcquireTokenInteractively().

The way this string is returned is language specific. For example, if a language already throws InvalidGrant exception, this string could be an additional field in the exception class. Other languages may have different ways to achieve the same goal, depending on how errors and status are currently communicated to apps that consume MSALs.

It is possible that new classifications will be added in the future. Applications are expected to treat all unknown classifications as if though classification was not present (default handling).

Usage pattern example

This is a sequence of calls that an app might make to take advantage of classification. This hypothetical app downloads a set of documents from various cloud endpoints and displays document thumbnails in a list. For documents that cannot be downloaded, the app makes decision on how to paint the UI, depending on the classification of InteractionRequired condition. This is a very simplified example, written in pseudo code. It assumes that each document is downloaded from a different cloud endpoint that requires a different access token. This will, of course, be much more complicated in reality.

foreach (documentUrl in documentUrls) {
    try {
        authParams = getAuthParams()
        authParams.scope = getScopeFromUrl(documentUrl)
        authenticationToken = AcquireTokenSilently(authParams)
        document = downloadDocument(authenticationToken, documentUrl)
        showDocumentThumbnail(document)
    }
    catch (UserInteractionRequiredException exception) {
        switch (exception.classification) {
            case "basic_action":
                // Show the button that invokes AcquireTokenInteractively() 
                showFixItButton();
                break;
            case "additional_action":
                // Show a message that explains to the user that fixing the problem is more involved.
                showAdditionalActionMessage();
                // Show the button that invokes AcquireTokenInteractively() 
                showFixItButton();
                break;
            case "message_only":
                // Do nothing here. Skip documents that cannot be downloaded at this time.
                break;
            default:
                // Invoke default error handling routine that assumes no tokens can be issued, and no documents can be shown. 
                // Hide all thumbnails and show a button to fix the issue.
                hideAllDocuments();
                showSignInMessage();
                showFixItButton();
                break;
        }
    }
}

Protocol

List of all server suberror codes, as of 2019-06-04 can be found here.

Implementation of parsing and mapping

Not all values currently returned in suberror field in the protocol map to InvalidGrant classification that is expected to be returned to apps. Some of the values returned are needed for other features, and are internal implementation that should not be exposed to calling apps directly. MSALs will parse suberror field, and map values to one of classes expected to be returned to the calling app, if applicable. Mapping should be as follows:

suberror code classification note
basic_action basic_action
additional_action additional_action
message_only message_only
consent_required consent_required
user_password_expired user_password_expired
bad_token [empty string] Internal to MSALs. Indicates that no further silent calls should be made with this refresh token.
token_expired [empty string] Internal to MSALs. Indicates that no further silent calls should be made with this refresh token.
protection_policy_required [empty string] Internal to MSALs. Needed in ios/android to complete the end-to-end true MAM flow. This suberror code is re-mapped to a different top level error code (IntuneAppProtectionPoliciesRequired), and not InteractionRequired.
client_mismatch [empty string] Internal to MSALs. Used in scenarios where an application is using family refresh token even though it is not part of FOCI (or vice versa). Needed to handle cases where app changes FOCI membership after being shipped. This is handled internally and doesn't need to be exposed to the calling app. Please see FOCI design document for more details.
device_authentication_failed [empty string] Internal to MSALs. Indicates that device should be re-registered.
[unknown value] return as is For extensibility purposes, unknown values are just passed to the app.

Compatibility and versioning considerations

Depending on how particular MSALs choose to implement this feature, it could be either a breaking change, or an additive change. For example, adding a new field on an exception object would be an additive change, and adding a new exception type altogether would be a breaking change. Additive change is preferred, but is not required. When an app takes the new version of MSAL with this change, compatibility implications must be clearly documented.

Additional changes to the protocol must only be applied to clients that know how to handle them.

MSAL.NET implementation

Describe alternatives you've considered

N/A, above is a result of lengthy discussions.

Additional context

Details available here.

henrik-me commented 5 years ago

Not ready to be picked up yet as we need to finalize API design for .NET. The original suggestion was to add sub-error to the exception thrown. However need to finalize on the design. The aim is to make this an additive change and thus not breaking existing behavior.

Assigning to @jmprieur until a decision has been made

henrik-me commented 5 years ago

Design complete. The details have been updated. Marking this for the next update we have planned for MSAL.

jmprieur commented 5 years ago

@henrik-me : I've added a paragraph proposing a .NET implementation. Feel free to assign for implementation if you are happy.

bgavrilMS commented 5 years ago

I started implementing this but I can't figure out how to E2E test this:

  1. Acquire a token interactively
  2. Manually delete the AT or force it to look expired
  3. AcquireTokenSilent, which will trigger a RereshToken flow

What could I do to get the errors described above? I tried resetting the user's password and got a "bad_token" response back.