aws-amplify / amplify-flutter

A declarative library with an easy-to-use interface for building Flutter applications on AWS.
https://docs.amplify.aws
Apache License 2.0
1.31k stars 243 forks source link

datastore mutation missed on expired token #4131

Open stevegaunt opened 10 months ago

stevegaunt commented 10 months ago

Description

If the app is connected to a network, but there is no connection, and this overlaps with the time the auth token becomes expired, and is not able to refresh the token because there is no connection. when the app eventually does have a connection, the 1st mutation to be attempted to sync fails because of ApiException(message: Failed to retrieve authorization token)

With this error, is seems it is not being retried once the token has been refreshed.

subsequent mutations will sync afterwards, it's just the 1st one fails..

We've only seen this with IOS users so far

Categories

Steps to Reproduce

No response

Screenshots

No response

Platforms

Flutter Version

3.13.9

Amplify Flutter Version

1.4.0

Deployment Method

Amplify CLI

Schema

No response

stevegaunt commented 10 months ago

Also, what makes this worse is we have a schema where a parent model has a one to many relationship with children. when this happens, it is possible the children can get synced. If datastore then does a delta sync( because app has been restarted ) then datastore will always go into local only state because it can't start the sync engine due to a child having a null parent.

khatruong2009 commented 10 months ago

Hi @stevegaunt, I am currently looking into this issue now and will get back to you when I have updates.

khatruong2009 commented 10 months ago

Hi @stevegaunt, I'm currently trying to recreate this issue and am unable to. I connected the app to a network without any connection. Then, I let the refresh token time out and when I came back to the app, restored connection to the app. Then, I tried to create an item. I got a different error than you did. Is this along the lines of how this issue shows up for you?

stevegaunt commented 10 months ago

hi @khatruong2009 yes, along those lines.. I would do quite a few mutations also whilst being offline.. then when connection restored and the app tries to sync the mutations ,the 1st one or 1st few will get ailed to retrieve authorization token error and fail to sync. It's only really on IOS I have seen this. what's the error you are getting?

khatruong2009 commented 10 months ago

Hi @stevegaunt, I guess I'm confusing one of the steps to reproducing the issue. I set all of my auth tokens in the cognito user pool to expire at 60 minutes. If I simulate a bad connection and then leave the app for an hour, i am logged out of the app due to the tokens expiring rather than a mutation failed. Is this what's happening with you?

stevegaunt commented 10 months ago

hi @khatruong2009

this is our live cognito setup Refresh token expiration 365 day(s) Access token expiration 1 day(s) ID token expiration 1 day(s)

For our staging to make this more reproducible we have it set to

Refresh token expiration 365 day(s) Access token expiration 5 minutes ID token expiration 5 minutes

we are definitely not signed out, as after the 1st - or maybe 2 or 3 failed mutations , the subsequent pending list of mutations will succeed ( as long as none of the failed mutations are a parent of children, that's when it gets messy with datastore)

khatruong2009 commented 9 months ago

Hi @stevegaunt, I'm unable to reproduce the exact issue you're having. I am having an issue where after my app gets a connection again and i send some mutations, those mutations sometimes won't sync to DynamoDb (in this case I'd have to restart the app to get the sync working again) but I haven't been able to get the ApiException.

stevegaunt commented 9 months ago

hi @khatruong2009 I find it easiest to reproduce using a real IOS device and use the Network Link conditioner, and setting this to 100% loss

I have this also have raised this in the past on the swift channel

https://github.com/aws-amplify/amplify-swift/issues/3356

khatruong2009 commented 9 months ago

Hi @stevegaunt, I just tried to do this with a real iOS device with Network Link conditioner and these are the results I got:

  1. Signed into the app
  2. Set network link conditioner to 100% loss
  3. Send a few mutations
  4. Wait 6 minutes until tokens expire
  5. Change Network Link Conditioner back to Wi-Fi

What happened was the mutations that I did in step 3 never got synced to dynamo db. When I try another mutation with the connection back up after step 5, that mutation will sync to the cloud. Although the mutations are never synced, which is an issue itself, I never got the ApiException(message: Failed to retrieve authorization token).

stevegaunt commented 9 months ago

HI @khatruong2009 That is the similar behaviour we see with the missing mutations.
It order to get the API message we setup a callback when configuring amplify

final AmplifyDataStore datastorePlugin = AmplifyDataStore(
  modelProvider: ModelProvider.instance,
  // Sync configuration override default 10000 to 20000:
  syncInterval: 86400,
  syncMaxRecords: 20000,
  syncPageSize: 1000,

  conflictHandler: (ConflictData data) {
    return ModelConflictHandler.instance.onConflictHandler(data);
  },
  errorHandler: (error) {
    Log.instance.error('[Amplify Datastore] ErrorHandler received: $error');
  },
);

It's the error handling callback that we see the error response.

The following message was thrown: [Amplify Datastore] ErrorHandler received: DataStoreException { "message": "Failed to retrieve authorization token.", "recoverySuggestion": "", "underlyingException": "The operation couldn’t be completed. (Amplify.APIError error 3.)" }

however, the mutations not getting synced is still the exact behaviour we are seeing.

We have a lot of users with using our app with poor connections, which results with frequent dataloss for these users.

What's the biggest issue with this also, if the mutation that has failed is a parent , then subsequent chid mutations get synced, datastore will get into a state on the next sync interval cycle because it will have orphaned children .

stevegaunt commented 9 months ago

As another observation, restart the app might replay some of the missing mutations( can't say it's every time , but have seen it eventually sync the missed mutations sometimes the following day)... but if the app has managed to sync a child mutation before the parent, the restart will fail; because the sync engine will go into local only mode because there a child without a parent.

khatruong2009 commented 9 months ago

Hi @stevegaunt, I see what you mean and since I've been able to reproduce the missing mutations on expired token issue, I'm going to bring it to my team and we will discuss a potential fix to this bug from there. I'll keep you updated, thanks!

stevegaunt commented 8 months ago

Hi @khatruong2009 , @Equartey

do you know if there has been any progress with this?

thanks

khatruong2009 commented 7 months ago

Hi @stevegaunt, it turns out this behavior is expected on iOS devices. We are in contact with the amplify-swift team to discuss how we can better handle this issue. Thank you for your patience

stevegaunt commented 7 months ago

hi @khatruong2009 Thanks for the update, please keep us posted on any updates thanks

stevegaunt commented 6 months ago

Hi @khatruong2009

Asking to see if there has been any update with your discussions with the amplify-swift team?

having dataloss is not ideal in production environments.

thanks