AzureAD / microsoft-authentication-library-common-for-android

Common code used by both the Active Directory Authentication Library (ADAL) and the Microsoft Authentication Library (MSAL)
MIT License
41 stars 35 forks source link

ANR Fix Option 1: Decouple perform cloud discovery from class lock #2285

Closed fadidurah closed 9 months ago

fadidurah commented 9 months ago

Relevant Bugs: https://identitydivision.visualstudio.com/Engineering/_workitems/edit/2502611 https://identitydivision.visualstudio.com/Engineering/_workitems/edit/2502610

What We received some bugs from Teams regarding users experiencing Android Not Responding crashes while using teams (Bugs linked above). During PCA creation, the foreground thread would get blocked while waiting on a lock to open in AzureActiveDirectory.

In MSAL, we have a usage of AzureActiveDirectory.setEnvironment() as part of initializeApplication() which is called at the end of the PCA creation process. initializeApplication() (unlike the rest of the PCA creation process in which the configuration file is read in and device mode is determined) is ran on the main, foreground thread. setEnvironment() is a synchronized method and thus shares a lock with every other synchronized method in AzureActiveDirectory. The reason we get an ANR is because AzureActiveDirectory.performCloudDiscovery is also a synchronized method, and setEnvironment must wait for the network call in that method to complete before the lock becomes available. performCloudDiscovery doesn't set the environment, only reads it from the static variable.

How To avoid the ANR, I've seperated the locks between setEnvironment and performCloudDiscovery. Remove synchronized from performCloudDiscovery and give it it's own lock. This way, setEnvironment will not be blocked while performCloudDiscovery is ran.

Why This fix will remove the need to wait for the network call to finish to unlock the class lock, which will greatly reduce the possibility of the ANR.

Testing It's difficult to test directly as the ANR is very network dependent and intermittent, will need teams to update with telemetry in the future.