AzureAD / microsoft-authentication-library-for-java

Microsoft Authentication Library (MSAL) for Java http://aka.ms/aadv2
MIT License
289 stars 146 forks source link

When Global AAD is unavailable, it will cause timeout issue with AAD authentication of native cloud. #605

Open yunbozhang-msft opened 1 year ago

yunbozhang-msft commented 1 year ago

Hi team,

I git clone MSAL4J code sample from this repo: ms-identity-java-webapp/msal-java-webapp-sample at master · Azure-Samples/ms-identity-java-webapp (github.com)

I config AAD configuration in application.properties file, and config to Azure China cloud. Endpoint is https://login.partner.microsoftonline.cn

Then run this sample in my local. Sample can be run successfully.

Then add the wrong DNS mapping in the hosts file to make the Global AAD endpoint inaccessible: image

Next to re-start sample in local, you will get timeout error:

2023-03-06 12:08:42.147 ERROR 10572 --- [onPool-worker-1] c.m.a.m.ConfidentialClientApplication    : [Correlation ID: b4352a2f-2cbe-4bb9-82a6-ae860c0addb5] Execution of class com.microsoft.aad.msal4j.AcquireTokenByAuthorizationGrantSupplier failed.

com.microsoft.aad.msal4j.MsalClientException: java.net.SocketTimeoutException: Connect timed out
    at com.microsoft.aad.msal4j.HttpHelper.executeHttpRequest(HttpHelper.java:53) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.AadInstanceDiscoveryProvider.executeRequest(AadInstanceDiscoveryProvider.java:278) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.AadInstanceDiscoveryProvider.sendInstanceDiscoveryRequest(AadInstanceDiscoveryProvider.java:235) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.AadInstanceDiscoveryProvider.doInstanceDiscoveryAndCache(AadInstanceDiscoveryProvider.java:339) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.AadInstanceDiscoveryProvider.getMetadataEntry(AadInstanceDiscoveryProvider.java:88) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.AuthenticationResultSupplier.getAuthorityWithPrefNetworkHost(AuthenticationResultSupplier.java:39) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.AcquireTokenByAuthorizationGrantSupplier.execute(AcquireTokenByAuthorizationGrantSupplier.java:59) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:69) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:18) ~[msal4j-1.13.5.jar:1.13.5]
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[na:na]
    at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760) ~[na:na]
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) ~[na:na]
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) ~[na:na]
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) ~[na:na]
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) ~[na:na]
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) ~[na:na]
Caused by: java.net.SocketTimeoutException: Connect timed out
    at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546) ~[na:na]
    at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597) ~[na:na]
    at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[na:na]
    at java.base/java.net.Socket.connect(Socket.java:633) ~[na:na]
    at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304) ~[na:na]
    at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:178) ~[na:na]
    at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:532) ~[na:na]
    at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:637) ~[na:na]
    at java.base/sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:266) ~[na:na]
    at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:380) ~[na:na]
    at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:193) ~[na:na]
    at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1242) ~[na:na]
    at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1128) ~[na:na]
    at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:179) ~[na:na]
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1665) ~[na:na]
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589) ~[na:na]
    at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:529) ~[na:na]
    at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:308) ~[na:na]
    at com.microsoft.aad.msal4j.DefaultHttpClient.readResponseFromConnection(DefaultHttpClient.java:105) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.DefaultHttpClient.executeHttpGet(DefaultHttpClient.java:47) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.DefaultHttpClient.send(DefaultHttpClient.java:35) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.HttpHelper.executeHttpRequestWithRetries(HttpHelper.java:96) ~[msal4j-1.13.5.jar:1.13.5]
    at com.microsoft.aad.msal4j.HttpHelper.executeHttpRequest(HttpHelper.java:49) ~[msal4j-1.13.5.jar:1.13.5]
    ... 15 common frames omitted

Why use an indigenous cloud to access global AAD endpoints? And there was a problem with the global AAD service before, when Global AAD was unavailable, it would affect the use of the native AAD(like Azure China AAD client).

Thanks!

siddhijain commented 1 year ago

Adding Bogdan's comment from the Incident

This is a good point. There are 2 issues here:

  1. If instance discovery fails with error except "invalid_instance", MSAL should ignore it

  2. Once instance discovery fails, MSAL should not re-attempt to perform instance discovery on that environment

I suggest we track this via a bug, as it will require a fix in the library.

siddhijain commented 1 year ago

This issue is fixed and the fix should be available in the next msal4j release.

siddhijain commented 1 year ago

Released version 1.13.6 of the library to take care of this. Please reopen this if the issue persists.

yunbozhang-msft commented 1 year ago

Thanks team

获取 Outlook for iOShttps://aka.ms/o0ukef


发件人: Siddhi @.> 发送时间: Saturday, March 25, 2023 4:41:45 AM 收件人: AzureAD/microsoft-authentication-library-for-java @.> 抄送: Zhang Yunbo @.>; Author @.> 主题: Re: [AzureAD/microsoft-authentication-library-for-java] When Global AAD is unavailable, it will cause timeout issue with AAD authentication of native cloud. (Issue #605)

Released version 1.13.6 of the library to take care of this. Please reopen this if the issue persists.

― Reply to this email directly, view it on GitHubhttps://github.com/AzureAD/microsoft-authentication-library-for-java/issues/605#issuecomment-1483382480, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKSVJXQBSGM6Q2DYEBAJF4DW5YBITANCNFSM6AAAAAAVQUYBKY. You are receiving this because you authored the thread.Message ID: @.***>

yunbozhang-msft commented 1 year ago

Hi @siddhijain I verified this issue locally, but still reported this error, I think we still need a PR to fix this issue, as some users' network is limited, and they may block some global Azure networking. Also, I do not see any PR link to this issue, so please help check if lost to merge or commit PR, thanks!

FYI: Error stack: image

yunbozhang-msft commented 1 year ago

Also I found I do not have permission to re-open this issue, could you please help reopen this issue? thanks! @siddhijain

Avery-Dunn commented 1 year ago

Hello @zhangyunbo1994 : It's been some time since you first reported this issue, so just to clarify: is this a problem that started happening for some new users/scenarios, or was the original issue completely unresolved (and you only tested it recently)? Just trying to figure out if there's an edge case we didn't cover, or if we may have misunderstood the root cause.

Also, I believe this was the PR with the fix: https://github.com/AzureAD/microsoft-authentication-library-for-java/pull/606

yunbozhang-msft commented 1 year ago

Hi @Avery-Dunn The original issue is completely unresolved. And also I tested this issue recently, this issue is not resolved in the latest SDK.

bgavrilMS commented 1 year ago

If instance discovery fails with 404, MSAL should ignore this. We do not guarantee that MSAL won't call public cloud.

As a workaround:

yunbozhang-msft commented 1 year ago

Hi @bgavrilMS thanks!

I try the workaround locally, but still timeout, so I think back-end still try to connect to AAD public endpoint even though set instanceDiscovery to false.

image