aws-amplify / amplify-android

The fastest and easiest way to use AWS from your Android app.
https://docs.amplify.aws/lib/q/platform/android/
Apache License 2.0
237 stars 111 forks source link

Fatal SocketException on InputStream.read - "Software caused connection abort" #2861

Open JornR94 opened 1 week ago

JornR94 commented 1 week ago

Before opening, please confirm:

Language and Async Model

Java

Amplify Categories

GraphQL API, DataStore

Gradle script dependencies

```groovy implementation 'com.amplifyframework:aws-api:2.16.1' implementation 'com.amplifyframework:aws-datastore:2.16.1' ```

Environment information

``` ------------------------------------------------------------ Gradle 8.0 ------------------------------------------------------------ Build time: 2023-02-13 13:15:21 UTC Revision: 62ab9b7c7f884426cf79fbedcf07658b2dbe9e97 Kotlin: 1.8.10 Groovy: 3.0.13 Ant: Apache Ant(TM) version 1.10.11 compiled on July 10 2021 JVM: 17.0.6 (JetBrains s.r.o. 17.0.6+0-b2043.56-10027231) OS: Windows 10 10.0 amd64 ```

Please include any relevant guides or documentation you're referencing

No response

Describe the bug

In our production app, I'm seeing a crash happen occasionally with the Amplify AWS SDK for Android. The crash is a Fatal SocketException. Full stack trace below:

Fatal Exception: ag.g The exception could not be delivered to the consumer because it has already canceled/disposed the flow or the exception has nowhere to go to begin with. Further reading: https://github.com/ReactiveX/RxJava/wiki/What's-different-in-2.0#error-handling | DataStoreException{message=Failure performing sync query to AppSync., cause=ApiException{message=Could not retrieve the response body from the returned JSON, cause=java.net.SocketException: Software caused connection abort, recoverySuggestion=Sorry, we don’t have a recovery suggestion for this error.}, recoverySuggestion=Sorry, we don’t have a recovery suggestion for this error.}

io.reactivex.rxjava3.plugins.RxJavaPlugins.onError (RxJavaPlugins.java:367) io.reactivex.rxjava3.internal.operators.single.SingleCreate$Emitter.onError (SingleCreate.java:82) com.amplifyframework.datastore.appsync.AppSyncClient.lambda$sync$0 (AppSyncClient.java:115) com.amplifyframework.api.aws.AppSyncGraphQLOperation$OkHttpCallback.onResponse (AppSyncGraphQLOperation.java:138) okhttp3.internal.connection.RealCall$AsyncCall.run (RealCall.kt:539) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:644) java.lang.Thread.run (Thread.java:1012)

Caused by com.amplifyframework.datastore.DataStoreException Failure performing sync query to AppSync.

com.amplifyframework.datastore.appsync.AppSyncClient.lambda$sync$0 (AppSyncClient.java:115) com.amplifyframework.api.aws.AppSyncGraphQLOperation$OkHttpCallback.onResponse (AppSyncGraphQLOperation.java:138) okhttp3.internal.connection.RealCall$AsyncCall.run (RealCall.kt:539) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:644) java.lang.Thread.run (Thread.java:1012)

Caused by com.amplifyframework.api.ApiException Could not retrieve the response body from the returned JSON

com.amplifyframework.api.aws.AppSyncGraphQLOperation$OkHttpCallback.onResponse (AppSyncGraphQLOperation.java:138) okhttp3.internal.connection.RealCall$AsyncCall.run (RealCall.kt:539) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:644) java.lang.Thread.run (Thread.java:1012)

Caused by java.net.SocketException Software caused connection abort

java.net.SocketInputStream.socketRead0 (SocketInputStream.java) java.net.SocketInputStream.socketRead (SocketInputStream.java:118) java.net.SocketInputStream.read (SocketInputStream.java:173) java.net.SocketInputStream.read (SocketInputStream.java:143) com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.readFromSocket (ConscryptEngineSocket.java:983) com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.processDataFromSocket (ConscryptEngineSocket.java:947) com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.readUntilDataAvailable (ConscryptEngineSocket.java:862) com.android.org.conscrypt.ConscryptEngineSocket$SSLInputStream.read (ConscryptEngineSocket.java:835) okio.InputStreamSource.read (InputStreamSource.java:93) okio.AsyncTimeout$source$1.read (AsyncTimeout.kt:128) okio.RealBufferedSource.request (RealBufferedSource.kt:209) okio.RealBufferedSource.require (RealBufferedSource.kt:202) okhttp3.internal.http2.Http2Reader.nextFrame (Http2Reader.kt:90) okhttp3.internal.http2.Http2Connection$ReaderRunnable.invoke (Http2Connection.kt:618) okhttp3.internal.http2.Http2Connection$ReaderRunnable.invoke (Http2Connection.kt:609) okhttp3.internal.concurrent.TaskQueue$execute$1.runOnce (TaskQueue.kt:102) okhttp3.internal.concurrent.TaskRunner.runTask (TaskRunner.kt:117) okhttp3.internal.concurrent.TaskRunner.access$runTask (TaskRunner.kt:42) okhttp3.internal.concurrent.TaskRunner$runnable$1.run (TaskRunner.kt:66) java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:644) java.lang.Thread.run (Thread.java:1012)

It looks to me like there might be a try/catch block missing somewhere in the flow of this error (in the DataStore com.amplifyframework.datastore or the GraphQL API in com.amplifyframework.api). I flagged a similar issue in this issue that seems related just now.

This exception occurred for about 2% of users of our app, which is having a significant impact on the crash rate of our app. I would love your help -- please let me know if I can provide further details to help with solving this.

Just like with the other issue: not sure it's helpful but interestingly, 63% of the exceptions happen on Samsung phones, which is far from the distribution of device-type for our userbase. So there seems to be some relation between the crashes occurring more often on Samsung OS (although it does also happen on other not-customized OEMs like Google Pixels, in this case 8% of crashes on Google phones).

Reproduction steps (if applicable)

No response

Code Snippet

// I'm pretty sure it's happening in the AWS Amplify SDK 

Log output

``` // Put your logs below this line ```

amplifyconfiguration.json

No response

GraphQL Schema

```graphql // Put your schema below this line ```

Additional information and screenshots

No response

mattcreaser commented 1 week ago

Thanks for the report @JornR94. As per the linked RxJava documentation the seeming cause here is that a socket exception (which often just means the network dropped) occurred after the emitter for the exception was already disposed.

While it's possible a bug could be fixed here on Amplify's side, this can also be worked around on the application side by ignoring such errors. The RxJava documentation has a good example, the relevant part is the ignoring of SocketException.

RxJavaPlugins.setErrorHandler(e -> {
    if (e instanceof UndeliverableException) {
        e = e.getCause();
    }
    if ((e instanceof IOException) || (e instanceof SocketException)) {
        // fine, irrelevant network problem or API that throws on cancellation
        return;
    }
    if (e instanceof InterruptedException) {
        // fine, some blocking code was interrupted by a dispose call
        return;
    }
    if ((e instanceof NullPointerException) || (e instanceof IllegalArgumentException)) {
        // that's likely a bug in the application
        Thread.currentThread().getUncaughtExceptionHandler()
            .handleException(Thread.currentThread(), e);
        return;
    }
    if (e instanceof IllegalStateException) {
        // that's a bug in RxJava or in a custom operator
        Thread.currentThread().getUncaughtExceptionHandler()
            .handleException(Thread.currentThread(), e);
        return;
    }
    Log.warning("Undeliverable exception received, not sure what to do", e);
});
JornR94 commented 1 week ago

Hi @mattcreaser, thanks for the quick reply! That makes a lot of sense, let me implement that myself to prevent this exception from crashing my app then.

As a side note, I did file this other issue that's pretty similar, but it seems like that's throwing off a StreamResetException which extends IOException, so that should also be covered by adding this error handler for RxJava 👍

Are there any plans for integrating this into the AWS Amplify SDK? Thanks Matt!

mattcreaser commented 1 week ago

We'll need to do a little more investigation to see if we can catch these errors internally so that they don't propagate out, I took a quick look but it wasn't immediately obvious where to do so.

We won't be adding an RxJavaPlugins.setErrorHandler however, as that is incorrect if done by a library, it is only appropriate to use for the end application code.

JornR94 commented 1 week ago

Makes sense! I can't recall 100%, but I don't think I saw this mentioned anywhere in the implementation docs for Amplify on Android--I think this would be very valuable information to add to the GraphQL/DataStore implementation docs, to prevent unexpected crashes like these after adding the Amplify SDK