aws-amplify / amplify-android

The fastest and easiest way to use AWS from your Android app.
https://docs.amplify.aws/lib/q/platform/android/
Apache License 2.0
249 stars 117 forks source link

AWS Amplify Auth v1 to v2 migration fails 5-10% of the time, logs user out #2929

Open camhart opened 1 month ago

camhart commented 1 month ago

Before opening, please confirm:

Language and Async Model

Java

Amplify Categories

Authentication

Gradle script dependencies

```groovy // Put output below this line implementation 'com.amplifyframework:aws-auth-cognito:2.21.0' ```

Environment information

``` # Put output below this line C:\Users\Cam\projects\project-android>gradlew --version ------------------------------------------------------------ Gradle 8.7 ------------------------------------------------------------ Build time: 2024-03-22 15:52:46 UTC Revision: 650af14d7653aa949fce5e886e685efc9cf97c10 Kotlin: 1.9.22 Groovy: 3.0.17 Ant: Apache Ant(TM) version 1.10.13 compiled on January 4 2023 JVM: 20.0.2 (Oracle Corporation 20.0.2+9-78) OS: Windows 10 10.0 amd64 ```

Please include any relevant guides or documentation you're referencing

No response

Describe the bug

I've updated my Android app to use AWS Amplify V2. I deployed it to beta users, and ~5-10% of them had issues with the data migration. Essentially they ended up logged out of the app after their app updated and migrated from v1 to v2. This shouldn't happen. If I have those customers uninstall/reinstall the android app, and login, everything works moving forward, however this isn't an acceptable solution.

I created a ticket with AWS support and they told me to create a github issue. See case 172444220700816.

Here's an example log output when the app attempts to make API calls but is unable to due to being logged out.

D/ 09-23 15:31:15.551 BackendCallTask( 5715): AUTH fetchAuthSessionRequest D/ 09-23 15:31:16.729 BackendCallTask( 5715): AUTH fetchAuthSessionRequest result, isSignedIn=true D/ 09-23 15:31:16.729 BackendCallTask( 5715): AUTH exception: SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.} W/ 09-23 15:31:16.732 System.err( 5715): SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.} W/ 09-23 15:31:16.732 System.err( 5715):  at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1.execute(SourceFile:48) W/ 09-23 15:31:16.732 System.err( 5715):  at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1$1.invokeSuspend(Unknown Source:12) W/ 09-23 15:31:16.733 System.err( 5715): Caused by: NotAuthorizedException(message=Invalid Refresh Token.) W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.model.NotAuthorizedException$Builder.a(SourceFile:4) W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.NotAuthorizedExceptionDeserializer.c(SourceFile:27) W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.d(SourceFile:344) W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.b(SourceFile:1) W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.c(SourceFile:43) W/ 09-23 15:31:16.733 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.b(SourceFile:1) D/ 09-23 15:31:28.709 BackendCallTask( 5715): AUTH fetchAuthSessionRequest D/ 09-23 15:31:28.963 BackendCallTask( 5715): AUTH fetchAuthSessionRequest result, isSignedIn=true D/ 09-23 15:31:28.963 BackendCallTask( 5715): AUTH exception: SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.} W/ 09-23 15:31:28.963 System.err( 5715): SessionExpiredException{message=Your session has expired., cause=NotAuthorizedException(message=Invalid Refresh Token.), recoverySuggestion=Please sign in and reattempt the operation.} W/ 09-23 15:31:28.963 System.err( 5715):  at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1.execute(SourceFile:48) W/ 09-23 15:31:28.963 System.err( 5715):  at com.amplifyframework.auth.cognito.actions.FetchAuthSessionCognitoActions$refreshUserPoolTokensAction$$inlined$invoke$1$1.invokeSuspend(Unknown Source:12) W/ 09-23 15:31:28.963 System.err( 5715): Caused by: NotAuthorizedException(message=Invalid Refresh Token.) W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.model.NotAuthorizedException$Builder.a(SourceFile:4) W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.NotAuthorizedExceptionDeserializer.c(SourceFile:27) W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.d(SourceFile:344) W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializerKt.b(SourceFile:1) W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.c(SourceFile:43) W/ 09-23 15:31:28.963 System.err( 5715):  at aws.sdk.kotlin.services.cognitoidentityprovider.serde.InitiateAuthOperationDeserializer.b(SourceFile:1) D/ 09-23 15:31:3

I'd like to request a feature addition to this library, where the migration creates persistent migration logs that the app developer can request to help troubleshoot issues like this. Also, it'd be able to be able to retry the migration. Right now it seems to destroy all the old v1 data and just assumes everything worked when it doesn't. The migration fails sporadically and I have no clue why, with no recourse for troubleshooting. I have to wait for a customer support ticket complaining about the problem in order to get logs, but they aren't really too helpful as they just show the user was signed out for some reason. I've been using aws amplify auth v1 for several years without any issue keeping users logged in.

Reproduction steps (if applicable)

I've been unable to reproduce the issue myself.

Code Snippet

// Put your code below this line.

Log output

``` // Put your logs below this line ```

amplifyconfiguration.json

{ "auth": { "plugins": { "awsCognitoAuthPlugin": { "IdentityManager": { "Default": {} }, "CredentialsProvider": { "CognitoIdentity": { "Default": { "PoolId": "us-west-2:xxxxxxxxxxxx", "Region": "us-west-2" } } }, "CognitoUserPool": { "Default": { "PoolId": "us-west-2_xxxxxxxxx", "AppClientId": "xxxxxxxxx", "AppClientSecret": "xxxxxxxxx", "Region": "us-west-2" } }, "Auth": { "Default": { "OAuth": { "WebDomain": "cognitoauth.xxxxxxxxx.io", "AppClientId": "xxxxxxxx", "AppClientSecret": "xxxxxxxxx", "SignInRedirectURI": "xxxxxxxx://callback/", "SignOutRedirectURI": "xxxxxxxx://signout/", "Scopes": [ "email", "openid", "profile", "aws.cognito.signin.user.admin" ] }, "authenticationFlowType": "USER_SRP_AUTH" } } } } } }

GraphQL Schema

```graphql // Put your schema below this line ```

Additional information and screenshots

One more detail. V1 of the amplify auth library has code that Google Play throws big warnings about and claims it'll stop accepting app updates that use it. Fixing this issue with the v1 -> v2 migration should be a top priority, as continuing to use v1 in the interim isn't an option. I essentially can't update my app unless it's using amplify v2.

mattcreaser commented 1 month ago

Sorry to hear you're having issues @camhart. Can you please confirm that you updated directly to 2.21.1 and did not first try to use an older version of v2? There was a known issue in the migration code that was fixed in version 2.16.1.

Is reinstalling the app the only solution? What about calling Amplify.Auth.fetchAuthSession with options specifying forceRefresh = true?

Are there any obvious similarities between the affected users?

camhart commented 1 month ago

Can you please confirm that you updated directly to 2.21.1 and did not first try to use an older version of v2? There was a known issue in the migration code that was fixed in version 2.16.1.

Yes, we went direct from v1 to v2.21.1.

Is reinstalling the app the only solution? What about calling Amplify.Auth.fetchAuthSession with options specifying forceRefresh = true?

I haven't tried this, but didn't think it would be needed. The SDK is supposed to detect when credentials are expired and handle refreshing them automatically isn't it?

mattcreaser commented 1 month ago

That's correct, it should - I only suggested trying to force refresh the tokens as a way to gather more information about what is going wrong. Another thought is to try catching the exception and invoking signOut.

We will need to investigate this issue to see what's going on - unfortunately it sounds like it will be difficult to reproduce. Any additional details about the affected users would be beneficial.

camhart commented 1 month ago

unfortunately it sounds like it will be difficult to reproduce

Ideally you can add more tools to the library so I can better troubleshoot the issue to provide more info. I'm confident if I release the app to another 1% of my customers, I'll get a few emails about it. But I don't want to do that until there's some ability to troubleshoot. We need some sort of migration record to indicate what happened to the migration and to understand why it failed. I'm not asking for you to solve it immediately. But adding some support for better troubleshooting migration issues seems like a low hanging fruit that moves the needle forward.

harsh62 commented 1 month ago

@camhart Can you please share the code snippets so that we can try to reproduce the issue in a local environment.. Snippets of how Auth category is being used from from both V1 and V2 will be really helpful to isolate how we investigate the issue. Please share any other details you think will help us isolate the issue.

camhart commented 1 month ago

@harsh62 I don't have code snippets to share that can reproduce the issue. I've tried multiple times with my entire app to replicate the problem and can't replicate it locally, but it is happening. This is why I'm arguing for better tools to investigate/troubleshoot problems relating to the migration.

Here are all the Amplify method calls I use:

V1 used the same method calls but adjusted for the api changes between the two. I don't use Amplify for anything else--only Auth.

Please share any other details you think will help us isolate the issue.

My app is a long running background app that stays running 24/7 in the background on the device (it's a parental control app). It automatically launches itself after an app update has occurred.

harsh62 commented 1 month ago

Are you able to isolate if the issue is happening with customers using Amplify.Auth.signInWithWebUI compared to Amplify.Auth.signIn? Another follow up to that would be, if your customers are able to use Amplify.Auth.signIn and Amplify.Auth.signInWithWebUI interchangeably? i.e. customer could be using Amplify.Auth.signInWithWebUI in Amplify V1 and decided to use Amplify.Auth.signIn in Amplify V2.

If you could answer this, it would greatly narrow down our reproduction codepath.

camhart commented 1 month ago

Are you able to isolate if the issue is happening with customers using Amplify.Auth.signInWithWebUI compared to Amplify.Auth.signIn?

Not easily. If the problem is happening to customers logged in via one of those calls, it's not happening 100% of the time. I can release the app to another 1% of customers and wait for the support tickets to come in, but I'm really hoping to avoid doing that without having better tools in place to troubleshoot the migration.

Another follow up to that would be, if your customers are able to use Amplify.Auth.signIn and Amplify.Auth.signInWithWebUI interchangeably?

They can use one or the other, but not both. Once logged in one way, we don't give them the option to login again without signing out first.

i.e. customer could be using Amplify.Auth.signInWithWebUI in Amplify V1 and decided to use Amplify.Auth.signIn in Amplify V2.

We don't give customers the ability to logout once the device is setup (there's additional steps they have to take after logging in to set the device up with my app). There's only a very brief window where they can logout where the customer has logged in but not setup the device. Once the device is setup, if they want to logout they need to uninstall/reinstall the app. The customers who've reported the issue to me have all had their device setup fully, so there is no longer an option for them to logout at that point. So, long story short, it's not possible for them to use Amplify.Auth.signIn and then use Amplify.Auth.signInWithWebUI (or vice versa). Does that make sense?

harsh62 commented 1 month ago

@camhart This is good information. Another question I have is that has your amplifyconfiguration.json file changed in anyway from Amplify V1 to V2?

From the issues reported, are you able to see if anything common in the affected users, device types, OS versions, manufacturer type, or anything else?

camhart commented 1 month ago

Another question I have is that has your amplifyconfiguration.json file changed in anyway from Amplify V1 to V2?

No it hasn't changed.

From the issues reported, are you able to see if anything common in the affected users, device types, OS versions, manufacturer type, or anything else?

I haven't kept track of this. However, I do recall Samsung being one of the devices and it was on OS version 13. I have multiple samsung test devices though and I haven't been able to replicate the issue on any of them. When I release the app update to more customers, we get reports of customers having issues, but I can guarantee many have the issue but never report it. They'll just cancel their subscription with us or try and resolve it on their own.

harsh62 commented 1 month ago

Thanks for providing all the information, one of our engineers will try to reproduce this issue locally by trying out different codepaths.. Will get back to you when we have more updates.

tylerjroach commented 2 weeks ago

@camhart One more question that would help in our research. Can you post all of the AWS dependencies you are using in Gradle? Ex Amplify as well as any other AWS SDKs.

camhart commented 2 weeks ago
    implementation 'com.amplifyframework:aws-auth-cognito:2.21.0'
    coreLibraryDesugaring 'com.android.tools:desugar_jdk_libs:2.0.3'

    implementation 'com.amazonaws:aws-android-sdk-apigateway-core:2.16.1'

Those are the only dependencies being used. Let me know if you need anything else!

tylerjroach commented 2 weeks ago

Api Gateway core is likely pulling in AWS Android SDK MobileClient transitively, which will overwrite Amplify v2 credentials. Please ensure to use a custom credentials provider, such as the one shown on our migration guide. https://docs.amplify.aws/gen1/android/sdk/configuration/amplify-compatibility/#android-sdk-generated-by-api-gateway-aws-android-sdk-apigateway-core

camhart commented 2 weeks ago

I've replaced the CognitoCredentialsProvider with the AmplifyCredentialsProvider indicated in the article mentioned. I'm still doing some testing but it's functioning as expected so far. I'm not able to do the full test until I release to a new % of production users.

However, I use Api Gateway's auto-generated SDK. Do I need to figure out how to remove com.amazonaws:aws-android-sdk-apigateway-core as a dependency completely? These are classes my app currently uses that come from that library I believe. Here are a few:

import com.amazonaws.AmazonClientException;
import com.amazonaws.mobileconnectors.apigateway.ApiClientException;
import com.amazonaws.mobileconnectors.apigateway.ApiClientFactory;

Edit: Also thanks for the help. Very much appreciated.

tylerjroach commented 2 weeks ago

That looks ok. As long as you aren't using mobile client, you should not run into any issues.