AzureAD / microsoft-authentication-library-for-python

Microsoft Authentication Library (MSAL) for Python makes it easy to authenticate to Microsoft Entra ID. General docs are available here https://learn.microsoft.com/entra/msal/python/ Stable APIs are documented here https://msal-python.readthedocs.io. Questions can be asked on www.stackoverflow.com with tag "msal" + "python".
https://stackoverflow.com/questions/tagged/azure-ad-msal+python
Other
770 stars 192 forks source link

[Feature Request] Raise MSAL specific error, instead of base error `ValueError`, `RuntimeError` #482

Open jiasli opened 2 years ago

jiasli commented 2 years ago

Originally from https://github.com/AzureAD/microsoft-authentication-library-for-python/pull/443#discussion_r895288236

Currently, MSAL raises Python's base error ValueError for scenarios such as tenant validation failure. Azure CLI can't really tell whether the error comes from MSAL or other places if we use a centralized error handling logic.

Now we handle all PersistenceError (https://github.com/AzureAD/microsoft-authentication-extensions-for-python/pull/108) in a centralized place:

https://github.com/Azure/azure-cli/blob/3dbcafbd273a5fefce793ba1e520dc61d8c468a8/src/azure-cli-core/azure/cli/core/util.py#L140-L151

    elif isinstance(ex, PersistenceError):
        # errno is already in strerror. str(ex) gives duplicated errno.
        az_error = azclierror.CLIInternalError(ex.strerror)
        if ex.errno == 0:
            az_error.set_recommendation(
                "Please report to us via Github: https://github.com/Azure/azure-cli/issues/20278")
        elif ex.errno == -2146893813:
            az_error.set_recommendation(
                "Please report to us via Github: https://github.com/Azure/azure-cli/issues/20231")
        elif ex.errno == -2146892987:
            az_error.set_recommendation(
                "Please report to us via Github: https://github.com/Azure/azure-cli/issues/21010")

This saves us from the necessity of capturing the error and handing it in all "joints" with MSAL.

But for this ValueError, we will have to capture it everywhere when MSAL interaction is made:

https://github.com/Azure/azure-cli/blob/1d973cceb38980181eeaa45934497d2148a3e7b2/src/azure-cli-core/azure/cli/core/auth/identity.py#L141-L143

        result = self._msal_app.acquire_token_interactive(
            scopes, prompt='select_account', port=8400 if self._is_adfs else None,
            success_template=success_template, error_template=error_template, **kwargs)

https://github.com/Azure/azure-cli/blob/1d973cceb38980181eeaa45934497d2148a3e7b2/src/azure-cli-core/azure/cli/core/auth/identity.py#L152

        result = self._msal_app.acquire_token_by_device_flow(flow, **kwargs)  # By default it will block

If we don't handle the error, the raw exception with traceback is shown, like in https://github.com/Azure/azure-cli/issues/20507, which is of course bad user experience.

I am thinking maybe MSAL can do the same for this ValueError (and other base error types raised) and create new error types which inherent from ValueError, such as AuthorityError, MsalValueError.

rayluo commented 2 years ago

Thanks for bringing up this thought-provoking topic.

Indeed, MSAL tends to use generic exceptions. The PersistenceError was actually an exception (pun not intended) to that convention, because there was a strong reason for PersistenceError to exist.

While MSAL could potentially revisit that design choice, your centralized error handling helper seems an unusual choice in itself. I can understand that you design that helper to reuse some code, which is reasonable. Does it also intrinsically lose the context of each exception? For example, when you catch an requests.HttpError in that monolithic error handler, you won't know which operation caused that HttpError, and then you won't be able to provide a really accurate error message, will you? That is probably why, catching exception on-the-spot seems to be more common.

By the way, if we examine this topic in the context of Python's built-in standard libraries, presumably many of them will share same set of generic exceptions. So, catching expected generic exceptions on-the-spot is a must.

Perhaps, you can get the best of both worlds, by implementing several smaller exception handlers. One of them can be def msal_exceptions_handler(ex) which specialized in catching ValueError. :-)

the necessity of capturing the error and handing it in all "joints" with MSAL.

In particular, how many joints are we talking about? I thought the entire MSAL usage has been wrapped inside a central helper. Would it be easy enough to implement on-the-spot exception handling in there?

jiasli commented 2 years ago

This issue also happens for https://github.com/Azure/azure-cli/issues/20507 where ValueError is raised for an invalid tenant.

Perhaps, you can get the best of both worlds, by implementing several smaller exception handlers. One of them can be def msal_exceptions_handler(ex) which specialized in catching ValueError. :-)

This is exactly what I am thinking. Even though we have centralized helpers PublicClientApplication and ConfidentialClientApplication, it is still necessary to apply msal_exceptions_handler on all methods, like acquire_token_interactive, acquire_token_by_device_flow, right? Unless we cast some black magic on __getattribute__ to make sure all methods are guarded by msal_exceptions_handler.

jiasli commented 2 years ago

I made a draft to demonstrate how it will look like if all MSAL interactions are wrapped with try-except clauses: https://github.com/Azure/azure-cli/pull/23311

jiasli commented 1 year ago

As discussed in our meeting, if MsalValueError can include an error_code field, Azure CLI can safely log the error code in its client telemetry without storing the error message which potentially contains personal information. This can greatly help us understand users' failure reasons and frequencies.

bgavrilMS commented 11 months ago

@rayluo - I think this should be a bug, as it's related to supportability. App developers should be able to catch MSAL specific exceptions, observe the error code and claims etc. Can they do this now? Do you have a sample / wiki that shows how to?