AzureAD / microsoft-authentication-library-for-python

Microsoft Authentication Library (MSAL) for Python makes it easy to authenticate to Microsoft Entra ID. General docs are available here https://learn.microsoft.com/entra/msal/python/ Stable APIs are documented here https://msal-python.readthedocs.io. Questions can be asked on www.stackoverflow.com with tag "msal" + "python".
https://stackoverflow.com/questions/tagged/azure-ad-msal+python
Other
770 stars 192 forks source link

acquire_token_silent hangs when switching between scopes - how do I fix this? #464

Closed nickpetzold closed 2 years ago

nickpetzold commented 2 years ago

We use AD auth to connect to various databases used by our application, one Postgres, one MSSQL.

We're experiencing issues when running our application in an EC2 container, when trying to acquire a token via acquire_token_silent as the function appears to hang ad infinitum, no response at all, not even an error/stacktrace (N.B. it seems to work absolutely fine when running locally).

Important point to note is that when we get a token for the first DB connection, the below code works fine, however it's the second connection where the connection issues arise - we use a different scope in this instance, could this be causing issues?

The code:

scopes = [
    scope
]
if not credential_manager:
    credential_manager = msal.ConfidentialClientApplication(
        os.environ["AZURE_CLIENT_ID"],
        authority=f'https://login.microsoftonline.com/{os.environ["AZURE_TENANT_ID"]}',
        client_credential=os.environ["AZURE_SECRET"],
    )

result = None
result = credential_manager.acquire_token_silent(scopes, account=None)
if not result:
    LOGGER.debug("No suitable token exists in cache. Let's get a new one from AAD.")
    result = credential_manager.acquire_token_for_client(scopes=scopes)
else:
    LOGGER.debug("Token found in cache.")

We've tried the approach of creating a ConfidentialClientApplication object both during initialisation and at the point at which we are trying to acquire the token - is there a best practice or can either be approach work?

Finally, we are running v1.14.0 of the MSAL library.

Any help would be much appreciated!

rayluo commented 2 years ago

Hi Nick, thanks for sending this report. We haven't seen this kind of issue before. We love to help the investigation.

the function appears to hang ad infinitum, no response at all, not even an error/stacktrace

When running your app inside an interactive console, you can use CTRL+BREAK to abort. At that point, Python will print a stacktrace indicating the break point when your CTRL+BREAK was received. You can then rerun and redo the break, several times, to see whether the break point are all the same. And then let us know the stacktrace.

(N.B. it seems to work absolutely fine when running locally)

If it stacktrace obtained from above indicates that it hanged during a network call, then we would need to double check the network connectivity difference between when a request originates from your local machine and your EC2 container.

We've tried the approach of creating a ConfidentialClientApplication object both during initialisation and at the point at which we are trying to acquire the token - is there a best practice or can either be approach work?

Generally speaking, when an API is designed to use Object-Oriented model, it likely implies that some initialization work would be done during the object initialisation time. Otherwise, the designer could probably just go with the flat function-style API in the first place. So, it would be more efficient to create your object once and reuse it whenever possible.

Finally, we are running v1.14.0 of the MSAL library.

If at all possible, you may try our latest MSAL 1.17. There will be no breaking change among MSAL 1.x series. And, there is no known issue between 1.14 and 1.17 that would affect the behavior that you observed. You will likely still observe the same issue when using MSAL 1.17, but that will at least give us a more recent base to work with.

nickpetzold commented 2 years ago

Hi Ray, thanks for the quick response!

So after exec'ing into the ECS container, we've managed to find out that it's not hanging afterall. Instead we're actually experiencing a seg fault:

Segmentation fault (core dumped)

Is this something you've ever experienced before? Everything runs smoothly when we run locally, so looks to be an issue with the way the server is set up potentially.

We've tried updating to 1.17 as well, but unfortunately no luck.

nickpetzold commented 2 years ago

Hi Ray, just to say we've found the issue and it's actually unrelated to MSAL so sorry for the inconvenience! We are running a multithreaded program which is causing some issues with pyodbc which is in turn struggling to make a connection to our MSSQL db. We incorrectly thought the issue was at the get auth stage - a good lesson to be better with our logging in the future!

Thanks for your help anyway and have a great weekend!

rayluo commented 2 years ago

Good to know. At least we still learn something from it. :-) I hope you will start reusing the ConfidentialClientApplication object, and enjoying MSAL 1.17. :-)

nickpetzold commented 2 years ago

Indeed we will, it works well!