aws-amplify / amplify-android

The fastest and easiest way to use AWS from your Android app.
https://docs.amplify.aws/lib/q/platform/android/
Apache License 2.0
249 stars 117 forks source link

Amplify.Auth.fetchAuthSession never ends #2816

Closed Andrew0000 closed 6 months ago

Andrew0000 commented 6 months ago

Before opening, please confirm:

Language and Async Model

Kotlin, Kotlin - Coroutines, RxJava

Amplify Categories

Authentication

Gradle script dependencies

```groovy amplify = "2.14.12" amplify = { module = "com.amplifyframework:core-kotlin", version.ref = "amplify" } amplifyCognito = { module = "com.amplifyframework:aws-auth-cognito", version.ref = "amplify" } amplifyPushPinpoint = { module = "com.amplifyframework:aws-push-notifications-pinpoint", version.ref = "amplify" } amplifyAnalyticsPinpoint = { module = "com.amplifyframework:aws-analytics-pinpoint", version.ref = "amplify" } amplifyRx = { module = "com.amplifyframework:rxbindings", version.ref = "amplify" } implementation(libs.amplify) implementation(libs.amplifyCognito) implementation(libs.amplifyPushPinpoint) implementation(libs.amplifyAnalyticsPinpoint) implementation(libs.amplifyRx) ```

Environment information

``` # Put output below this line ------------------------------------------------------------ Gradle 8.4 ------------------------------------------------------------ Build time: 2023-10-04 20:52:13 UTC Revision: e9251e572c9bd1d01e503a0dfdf43aedaeecdc3f Kotlin: 1.9.10 Groovy: 3.0.17 Ant: Apache Ant(TM) version 1.10.13 compiled on January 4 2023 JVM: 11.0.10 (JetBrains s.r.o. 11.0.10+0-b96-7281165) OS: Mac OS X 14.4.1 x86_64 ```

Please include any relevant guides or documentation you're referencing

https://docs.amplify.aws/gen1/android/build-a-backend/auth/accessing-credentials/

Describe the bug

Amplify.Auth.fetchAuthSession never ends.

Configuration

Amplify.addPlugin(AWSCognitoAuthPlugin())
Amplify.addPlugin(AWSPinpointAnalyticsPlugin())
Amplify.addPlugin(AWSPinpointPushNotificationsPlugin())
Amplify.configure(app.applicationContext)

typealias AmplifyKt = com.amplifyframework.kotlin.core.Amplify

Flow

...
RxAmplify.Auth.confirmSignUp(username, code) // SMS Confirmation
RxAmplify.Auth.signOut()
RxAmplify.Auth.signIn(username, password)
Completable.fromAction({ refreshTokens(forceRefresh = true) })
...
// Just one of the variants I tried
fun refreshTokens(forceRefresh: Boolean = false) {
    runBlocking {
        try {
            // This line is invoked. But nothing below.
            AmplifyKt.Auth
              .fetchAuthSession(
                AuthFetchSessionOptions.builder().forceRefresh(forceRefresh).build()
              )
              // This never happens
              ...
        } catch (e: Exception) {
           // This never happens
            ...
        }
    }
}

As you can see from the comments, fetchAuthSession() never returns a result. It just hangs there forever.

An interesting observation: If I add something like Rx .delay(1, TimeUnit.MICROSECONDS, Schedulers.io()) before Completable.fromAction({ refreshTokens(forceRefresh = true) }) then everything works.

With this observation it seems like the problem related to multithreading / synchronization / maybe a deadlock.

Reproduction steps (if applicable)

No response

Code Snippet

No response

Log output

No response

amplifyconfiguration.json

No response

GraphQL Schema

No response

Additional information and screenshots

No response

tylerjroach commented 6 months ago

Hi @Andrew0000,

Can you show how you are calling Completable.fromAction({ refreshTokens(forceRefresh = true) }). Wanting to make sure you are subscribing on an IO thread and not attempting to call fetchAuthSession runBlocking on main thread.

Is there a reason that you are mixing RxJava and Coroutines? While I don't see any reason that it couldn't work, why not just use RxAmplify.Auth.fetchAuthSession() if you are using in an RxJava context?

Can you

  1. Post a full snippet in how the Completable.fromAction is subscribed.
  2. Add AndroidLoggingPlugin(LogLevel.VERBOSE) as the first configured Amplify plugin and then provide the logs displayed when fetchAuthSession is called.
Andrew0000 commented 6 months ago

Hi @tylerjroach , Wanting to make sure you are subscribing on an IO thread - Why are you asking? Is hanging acceptable if it's called not from a background thread (we are not talking about UI freeze)?

Is there a reason that you are mixing RxJava and Coroutines - As I specified in the comment it's a latest variation while I was doing the testing. I tried coroutines as well. Is hanging acceptable if fetchAuthSession called via that kind of transformation?

In general it's called in a blocking way for a compatibility with different consumers. Please tell me if this is something that is not obviously prohibited.

I tried with AndroidLoggingPlugin(LogLevel.VERBOSE), but see no logs from Amplify.

May be I will be able to provide more snippets later. Meanwhile, your answers will be helpful.

tylerjroach commented 6 months ago

Hi @Andrew0000

The mixing and matching of RxJava and Coroutines should not cause any compatibility issues, although it isn't an ideal implementation. Can you post a longer code snippet that shows how Completable.fromAction({ refreshTokens(forceRefresh = true) }) is being called (ex: how you are subscribing and listening to success/error cases.)

Please check and make sure the logging plugin is added like this. You should see a large amount of logs from Amplify by doing so.

Amplify.addPlugin(AndroidLoggingPlugin(LogLevel.VERBOSE))
Amplify.addPlugin(AWSCognitoAuthPlugin())
Amplify.addPlugin(AWSPinpointAnalyticsPlugin())
Amplify.addPlugin(AWSPinpointPushNotificationsPlugin())
Amplify.configure(app.applicationContext)
Andrew0000 commented 6 months ago

I'm asking because an RxJava completable won't do anything until its subscribed to. Calling a network bound task on the main thread is not an allowed use case. Something may be blocking this call from being made, since if it were made on the main thread, it would likely cause a NetworkingOnMainThreadException. So where is the error? Where is documentation that describes that it's forbidden? And most importantly, the question remains: Is hanging acceptable if it's called not from a background thread (we are not talking about UI freeze)? I believe not.

it isn't an ideal implementation - totally agree. The underlying reason is: we have several modules, some use Rx, some use Coroutines. And it's just convenient to have a framework-independent code. We can refactor it to some single technology every 5-10 years or we can use the code that works regardless of the usage of blocking operators.

By the way, it was really the problem, I just tried it without blocking calls and seems like it helped. But it really requires a lot of time to refactor the codebase to make it work in my case. This is why I'm asking why a legitimate blocking operator may freeze the Amplify library (not UI).

tylerjroach commented 6 months ago

Networking on the main thread is never permitted and the Android OS explicitly blocks it: https://developer.android.com/reference/android/os/NetworkOnMainThreadException. This is not an Amplify limitation, this is not allowed by Android.

This isn't actually what is happening though, as internally, Amplify launches its own IO context to make the network request. What is happening, however, is that your example is calling runBlocking on the main thread to sleep until all asynchronous work is completed.

CoroutineScope(Dispatchers.Main).launch {
    runBlocking {
        val session = AmplifyKt.Auth.fetchAuthSession()
        Log.d("session", "$session")
    }
}

I tested this which did run to completion, however, it locked the UI for nearly 2 seconds, which is definitely something you cannot allow to happen in production.

You haven't posted enough code to how you are subscribing to the Rx Completable to get a full picture for what is not allowing the call to happen. I don't believe the call is even being made.

Is hanging acceptable if it's called not from a background thread (we are not talking about UI freeze)?

This isn't happening. If it were hanging from the main thread the UI would be locked. There is only 1 main thread in an application and if Amplify were locking it, you would not be getting any UI updates.

I understand you intention to write code in a synchronous way for easier implementation. RxJava (through chaining) and Kotlin Coroutines (through contexts) both allow you to write code in a more synchronous manner. All of these technologies still require that any blocking code is done off of the main thread.

If your end goal is to update the UI after doing work on an IO thread, you should use a pattern that allows your UI to subscribe to Observer updates (ex: LiveData, Flow, etc)

Andrew0000 commented 6 months ago

@tylerjroach As I mentioned there is no UI freeze. The blocking invocation happens in the background thread. You told that an exception should be expected if it's on the main. I responded: where is the Exception then? I'm not sure why are you talking about all these things.

Can we now return to the question "Why a blocking invocation can hang Amplify"? Thread doesn't matter as far as I see.

Andrew0000 commented 6 months ago

Here's a simpler reproduction of the bug:

RxAmplify.Auth
    .confirmSignUp(username, code)
    .subscribeOn(Schedulers.io())
    .flatMap {
        RxAmplify.Auth.signIn(username, password)
    }
    .flatMapCompletable {
        Timber.d("test_ start 1")
        Completable.fromAction {
            Timber.d("test_ start 2")
            val result = RxAmplify.Auth
                .fetchAuthSession(
                    AuthFetchSessionOptions.builder().forceRefresh(false).build()
                )
                .doOnSubscribe { Timber.d("test_ doOnSubscribe") }
                .doOnSuccess { Timber.d("test_ doOnSuccess") }
                .doOnError { Timber.d("test_ doOnError") }
                .blockingGet()
            Timber.d("test_ result: $result")
        }
    }
    .observeOn(AndroidSchedulers.mainThread())
    .subscribe({ onSignInSuccess() }, ::onError)
    .also(::addDisposable)

Output:

2024-05-14 16:08:48.788 12495-12550  D  test_ start 1
2024-05-14 16:08:48.788 12495-12550  D  test_ start 2
2024-05-14 16:08:48.789 12495-12550  D  test_ doOnSubscribe

Amplify hangs.

Andrew0000 commented 6 months ago

The same with runBlocking inside:

RxAmplify.Auth
    .confirmSignUp(username, code)
    .subscribeOn(Schedulers.io())
    .flatMap {
        RxAmplify.Auth.signIn(username, password)
    }
    .flatMapCompletable {
        Timber.d("test_ start 1")
        Completable.fromAction {
            runBlocking {
                try {
                    Timber.d("test_ start 2")
                    val result = AmplifyKt.Auth
                        .fetchAuthSession(
                            AuthFetchSessionOptions.builder().forceRefresh(false).build()
                        )
                    Timber.d("test_ result: $result")
                } catch (e: Exception) {
                    Timber.d(e, "test_ exception")
                    ensureActive()
                }
            }
        }
    }
    .observeOn(AndroidSchedulers.mainThread())
    .subscribe({ onSignInSuccess() }, ::onError)
    .also(::addDisposable)

Output:

2024-05-14 16:20:01.628 13224-13316  D  test_ start 1
2024-05-14 16:20:01.629 13224-13316  D  test_ start 2

Amplify hangs.

tylerjroach commented 6 months ago

Thank you for sending a full code block example. With this, I'm able to see a full picture of how you are using RxAmplify and AmplifyKt. I've been able to track exactly what you are running into. With this full example, we can see that none of your blocking call requests are blocking the main thread.

Internally, Amplify Auth uses Coroutines to ensure that Auth calls are not executed in parallel and are queued to be run 1 at a time.

override fun fetchAuthSession(
    options: AuthFetchSessionOptions,
    onSuccess: Consumer<AuthSession>,
    onError: Consumer<AuthException>
) {
    queueChannel.trySend(
        pluginScope.launch(start = CoroutineStart.LAZY) {
            try {
                val result = queueFacade.fetchAuthSession(options)
                onSuccess.accept(result)
            } catch (e: Exception) {
                onError.accept(e.toAuthException())
            }
        }
    )
}

When the onSuccess/onError response is provided, the result is returned from within the plugin's coroutine scope that only allows for a single execution at a time.

The code in your example that uses runBlocking or blockingGet is causing a deadlock. The response callbacks are returned from within the single execution queue, which means that the secondary blocking calls are being requested while still inside the signIn call's single execution scope.

This explains why .delay(1, TimeUnit.MICROSECONDS, Schedulers.io()) prevents the deadlock. It allows the onSuccess() callback from signIn to completely return.

To avoid the lockup, we would recommend not mixing Amplify Coroutines and Amplify RxJava together, or use the workaround with the delay as you suggested.

We will internally take an action item to look into preventing the deadlocks that can be produced by mixing our Coroutine and RxJava methods together.

An Amplify internal change to the below code should avoid the deadlock.

override fun fetchAuthSession(
    options: AuthFetchSessionOptions,
    onSuccess: Consumer<AuthSession>,
    onError: Consumer<AuthException>
) {
    queueChannel.trySend(
        pluginScope.launch(start = CoroutineStart.LAZY) {
            try {
                val result = queueFacade.fetchAuthSession(options)
                pluginScope.launch {
                    onSuccess.accept(result)
                }
            } catch (e: Exception) {
                pluginScope.launch {
                    onError.accept(e.toAuthException())
                }
            }
        }
    )
}
Andrew0000 commented 6 months ago

@tylerjroach Thanks a lot for the explanation.

tylerjroach commented 6 months ago

I'm going to go ahead and close this issue since you are no longer blocked. #2822 addresses the scenario that was causing the deadlock. The next version of Amplify should prevent this scenario from occurring.

github-actions[bot] commented 6 months ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.