aws-amplify / amplify-flutter

A declarative library with an easy-to-use interface for building Flutter applications on AWS.
https://docs.amplify.aws
Apache License 2.0
1.31k stars 247 forks source link

Api GraphQL subscription stops working properly in release mode #3865

Closed jamilsaadeh97 closed 7 months ago

jamilsaadeh97 commented 1 year ago

Description

Hi everyone!

My users from my app in production are complaining from an issue and it's been one week I'm trying to debug it to know if it's a problem in my code or a bug in Amplify.

I'm subscribing to a stream and the issue happens when I swipe the app away, switch the phone off, turn back on again and go back to the app. The subscription is still connected (I'm also listening to the Api Hub Channel) but stops receiving any event whatsoever.

I found #2918 mentioning that "the underlying websocket connection is not able to stay alive" when the app goes to the background. To deal with that I'm using AppLifecycleState to stop the subscription on paused and re-listen on resumed.

I gathered some info:

  1. The issue is only happening in release mode in iOS on a physical device (prod users, ad hoc or TestFlight)
  2. The issue is not consistently happening from the first try. Out of 20 times, I was able to reproduce it 17 times on the first try(background-phone off-phone on).
  3. I noticed it happens fairly regularly when I switch the phone off, send an event (in my test, the subscription is onCreate), turn the phone back on and go back to the app.
  4. No need to wait an amount of time in the background for the issue to arise.
  5. If it didn't happen on the first try, it will on the second or third.
  6. Now the most important thing to note: I noticed, while debugging it in debug mode (and using AmplifyLogger.verbose), that when I do the steps to reproduce it, the underlying WebSocketBloc receives a "UnsubscribeEvent" but it doesn't go all the way to "Closed subscription...." and I think this is where something is going wrong. Let's say I switched the phone off and on 6 times. When I filter the logs for "subscription event", I get 6 events(expected). When I filter the logs for "closed subscription", I get 4(unexpected). It should be 5. A subscription was beginning to unsubscribe but stopped somehow.

Please find bellow a minimum reproducible example.

Minimum reproducible example ```dart void main() async { WidgetsFlutterBinding.ensureInitialized(); AmplifyLogger().logLevel = LogLevel.verbose; await configureAmplify(); runApp(child: const DebugApp()); } Future _configureAmplify() async { final apiPlugin = AmplifyAPI(modelProvider: ModelProvider.instance); final authPlugin = AmplifyAuthCognito(); await Amplify.addPlugins([ apiPlugin, authPlugin, ]); try { await Amplify.configure(amplifyconfig); } on AmplifyAlreadyConfiguredException { safePrint( 'Tried to reconfigure Amplify; this can occur when your app restarts on Android.'); } on Exception catch (e) { safePrint("Exception when configuring amplify: $e"); } } class DebugApp extends StatelessWidget { const DebugApp({super.key}); @override Widget build(BuildContext context) { return MaterialApp( scaffoldMessengerKey: scaffoldMessengerKey, debugShowCheckedModeBanner: false, title: 'Debug App', theme: ThemeData.dark(), home: const MyList(), ); } } class MyList extends StatefulWidget { const MyList({super.key}); @override State createState() => _MyListState(); } class _MyListState extends State with WidgetsBindingObserver { @override void initState() { super.initState(); WidgetsBinding.instance.addObserver(this); subscribe(); listenToApiHub(); } @override void didChangeAppLifecycleState(AppLifecycleState state) { switch (state) { case AppLifecycleState.resumed: subscribe(); case AppLifecycleState.paused: unsubscribe(); default: break; } } @override void dispose() { WidgetsBinding.instance.removeObserver(this); unsubscribe(); super.dispose(); } StreamSubscription>? _itemsSubscription; List itemsList = []; String _hubConnectionStatus = ""; void subscribe() { final subscriptionRequest = ModelSubscriptions.onCreate(Item.classType); final Stream> operation = Amplify.API.subscribe( subscriptionRequest, onEstablished: () => safePrint('Subscription established'), ); _itemsSubscription ??= operation.listen( (event) { safePrint('Subscription event data received: ${event.data}'); if (event.data != null) { if (mounted) { setState(() { itemsList.insert(0, event.data!); }); } } }, onError: (Object e) => scaffoldMessengerKey.currentState?.showSnackBar(SnackBar( content: Text( "Error in stream: ${e.toString()}", style: const TextStyle(color: Colors.white), ), backgroundColor: Colors.red, )), ); } void listenToApiHub() { Amplify.Hub.listen(HubChannel.Api, (event) { if (event is SubscriptionHubEvent) { if (mounted) { setState(() { _hubConnectionStatus = event.status.name; }); } } }); } void unsubscribe() { _itemsSubscription?.cancel(); _itemsSubscription = null; } @override Widget build(BuildContext context) { return SafeArea( child: Scaffold( body: Column( children: [ Text( "Connection Status: $_hubConnectionStatus", style: const TextStyle(color: Colors.white), ), const SizedBox( height: 20, ), Expanded( child: ListView.builder( itemBuilder: (context, index) => ListTile( title: Text( itemsList[index].id, style: const TextStyle(color: Colors.white), ), ), itemCount: itemsList.length, )) ], ), ), ); } } ```

What's fascinating is that when I do the steps to reproduce, if I see the Connection Status in the example above, whenever I open the app back, go from "connecting" to "connected", I can receive new events. If I go back and it's directly "connected", no events are received anymore (hence my theory that a stream subscription was starting to cancel in the WebSocketBloc but stopped midway). Also if it stops receiving events and I stay in the app, I get the Timeout exception after some time from the stream and only after this exception that the status goes from "connected" to "disconnected".

Finally, in debug mode, I still see the WebSocketBloc not cancelling one of the subscriptions all the way but I still receive events no matter how many times I close the app.

This issue is affecting the most critical and core part of my app (this is the only place I'm using subscriptions) and more and more users are complaining from it since it will require them to close the app completely and reopen it again (a very bad user experience).

I tried to give as much details of my findings as possible. If something is not clear or not explained properly please let me know.

Thank you!

Categories

Steps to Reproduce

  1. Have a stream subscription listening to onCreate or onUpdate or both.
  2. Listen to the AppLifecycle changes. (subscribe on resumed, unsubscribe on paused)
  3. Swipe the app away to go back to the Home Screen of iOS.
  4. Switch the phone off.
  5. OPTIONAL (I felt that the issue happens more often with this step): Send an event depending on the stream subscribed to
  6. Switch the phone on and go back to the app
  7. No more new events are received after this point. (If events are received, repeat from step 3 to 6 until the issue arises)

Screenshots

No response

Platforms

Flutter Version

3.13

Amplify Flutter Version

1.4.2

Deployment Method

Amplify CLI

Schema

type Item @model{
  id: ID!
}
jamilsaadeh97 commented 1 year ago

I'm also able to reproduce the issue when I swipe the app off and on quickly.

Equartey commented 1 year ago

@jamilsaadeh97 sorry you're facing this issue.

I'll look into this. Thank you for the very detailed write up.

Have you been able to reproduce the issue on Android?

jamilsaadeh97 commented 1 year ago

Hi @Equartey

So far no Android users complained about this. I will try to see if I get the same results on my side.

I did some additional debugging and got more info. Please find bellow the updated minimum reproducible example:

Minimum reproducible example ```dart class DebugApp extends StatelessWidget { const DebugApp({super.key}); @override Widget build(BuildContext context) { return MaterialApp( scaffoldMessengerKey: scaffoldMessengerKey, debugShowCheckedModeBanner: false, title: 'Debug App', theme: ThemeData.dark(), home: const MyList(), ); } } class MyList extends StatefulWidget { const MyList({super.key}); @override State createState() => _MyListState(); } class _MyListState extends State with WidgetsBindingObserver { @override void initState() { super.initState(); WidgetsBinding.instance.addObserver(this); subscribe(); listenToApiHub(); } @override void didChangeAppLifecycleState(AppLifecycleState state) { switch (state) { case AppLifecycleState.resumed: subscribe(); case AppLifecycleState.paused: unsubscribe(); default: break; } } @override void dispose() { WidgetsBinding.instance.removeObserver(this); unsubscribe(); super.dispose(); } StreamSubscription>? _itemsSubscription; List itemsList = []; String _hubConnectionStatus = ""; void subscribe() { final subscriptionRequest = ModelSubscriptions.onCreate(Item.classType); final Stream> operation = Amplify.API.subscribe( subscriptionRequest, onEstablished: () => safePrint('Subscription established'), ); _itemsSubscription ??= operation.listen( (event) { safePrint('Subscription event data received: ${event.data}'); if (event.data != null) { if (mounted) { setState(() { itemsList.insert(0, event.data!); }); } } }, onError: (Object e) => scaffoldMessengerKey.currentState?.showSnackBar(SnackBar( content: Text( "Error in stream: ${e.toString()}", style: const TextStyle(color: Colors.white), ), backgroundColor: Colors.red, )), ); } void listenToApiHub() { Amplify.Hub.listen(HubChannel.Api, (event) { if (event is SubscriptionHubEvent) { if (mounted) { setState(() { _hubConnectionStatus = event.status.name; }); } } }); } Future unsubscribe() async { await _itemsSubscription?.cancel(); _itemsSubscription = null; } void mimicAppOffAndOn({required int count}) async { for (var i = 0; i < count; i++) { unsubscribe(); subscribe(); } } @override Widget build(BuildContext context) { return SafeArea( child: Scaffold( persistentFooterButtons: [ FloatingActionButton( onPressed: () => mimicAppOffAndOn(count: 3), backgroundColor: Colors.blue, child: const Text("Mimic"), ), FloatingActionButton( onPressed: unsubscribe, backgroundColor: Colors.red, child: const Text("Unsubscribe"), ), ], body: Column( children: [ Text( "Connection Status: $_hubConnectionStatus", style: const TextStyle(color: Colors.white), ), const SizedBox( height: 20, ), Expanded( child: ListView.builder( itemBuilder: (context, index) => ListTile( title: Text( itemsList[index].id, style: AppTheme.headline, ), ), itemCount: itemsList.length, )) ], ), ), ); } } ```

I've added to the Scaffold two FloatingActionButton: The blue button when pressed will mimic the app going on and off quickly a specified number of times. The red button will close the subscription.

When I click the blue button followed by the red button. The expected behavior should be that the websocket bloc shutting down successfully. That's not the case. Also in the console logs I get 4 "Subscription event"(expected) and 1 "Closed subscription"(Unexpected). If I hit the red button to unsubscribe, the websocket bloc does not shut down. It stays alive and receives events (SubscriptionDataEvent) but the stream subscription does not receive any (since it's closed). I was able to achieve this in debug mode.

Hope this will help digging into it.

diepvicuong commented 1 year ago

I also have this problem. Do you have any solutions?

jamilsaadeh97 commented 1 year ago

Hi @Equartey , do you have any updates concerning this issue?

I tried adding a delay before re-subscribing, adding a delay after cancelling the subscription and adding a debouncer. Same results. Something is making the underlying bloc get stuck somewhere. Is there a way to get the logs from AmplifyLogger in release mode? Maybe I can save them somewhere to check what is happening and if any errors or inconsistencies show up.

jamilsaadeh97 commented 1 year ago

Also I can confirm that every time the subscription gets stuck, after 5 mins, the hub send a disconnected event and I receive a Timeout Exception. And only then, if I lock the phone and reopen the app, it gets reconnected and data are received (Until it gets stuck again by locking the phone and reopening the app and so on...).

Equartey commented 1 year ago

Hey @jamilsaadeh97, sorry for the delay. I'm actively looking at this today.

However, I'm still trying to reproduce the described behavior on my physical iOS device.

Were you able to reproduce on Android?

jamilsaadeh97 commented 1 year ago

I will try to reproduce on Android right now and get back to you

Equartey commented 1 year ago

@jamilsaadeh97 One thing that I noticed while using your sample code is multiple subscriptions are created when using the mimic button and sometimes when swapping fore/back-ground states. This can lead to extra subscriptions not unsubscribing.

One way to fix this is to only declare the subscription request once since every time a request is created an additional request ID is generated (ie, every time subscription() is called an additional request with a new ID is created). Which is used by the underlying bloc to determine if the request exists. For example, moving final subscriptionRequest = ModelSubscriptions.onCreate(Item.classType); to be a class property on _MyListState produces consistent results in my testing.

Since I'm unable to repro your exact issue, can you make that change and let me know what you observe? Sorry this is a ticky one to trouble shoot.

jamilsaadeh97 commented 1 year ago

Hi @Equartey , I tried your workaround but still the same issue is happening. After locking the phone and re-open it again, no data is received.

Also I can confirm on Android everything is working without any workarounds.

Did you try reproducing it with ad hoc ?

jamilsaadeh97 commented 1 year ago

Please find attached a screen recording of the issue. Note that the blue snack bar ("New Item Added") is a push notification received on foreground whenever a new item is created and is not related to the stream. It will only show you that indeed an item was added and should be received by onCreate.

Also I did not press any of the buttons.

https://github.com/aws-amplify/amplify-flutter/assets/85061274/216c6dd0-0f95-4a3f-a2ab-7199528db529

damir-fell commented 1 year ago

I can add that we are also now experiencing this issue with our production app for iOS that uses Amplify subscriptions.

Issue happens as described by the OP when app goes background/foregrounding the app and sometimes the subscription are not established correctly again and we get this error 5 minutes after trying to establish the subscriptions.

I do not have any logs or other things to share at the moment, I wanted just to chime in that this is a problem we are seeing.

Also, regarding the SubscriptionHubEvents, they seem to only trigger when the last subscription is disconnected and the first subscription is connected which makes it not usefull for our app as we have many subscriptions active at the same time and we cannot monitor each one of them using these events

Equartey commented 1 year ago

@jamilsaadeh97 can you run flutter doctor and paste the output? Can you also confirm the versions of amplify packages in your pubspec.yaml?

jamilsaadeh97 commented 1 year ago

Of course @Equartey ,

Flutter Doctor ``` [✓] Flutter (Channel stable, 3.13.0, on macOS 14.0 23A344 darwin-arm64, locale en-US) [✓] Android toolchain - develop for Android devices (Android SDK version 34.0.0) [✓] Xcode - develop for iOS and macOS (Xcode 15.0) [✗] Chrome - develop for the web (Cannot find Chrome executable at /Applications/Google Chrome.app/Contents/MacOS/Google Chrome) ! Cannot find Chrome. Try setting CHROME_EXECUTABLE to a Chrome executable. [✓] Android Studio (version 2022.3) [✓] VS Code (version 1.82.3) [✓] Connected device (4 available) [✓] Network resources ```
Pubspec.yaml ``` cupertino_icons: ^1.0.2 responsive_sizer: ^3.1.0 flutter_riverpod: ^2.3.0 fl_chart: <1.0.0 flutter_calendar_carousel: ^2.4.0 intl_phone_number_input: ^0.7.1 fpdart: ^0.5.0 amplify_flutter: ^1.4.0 amplify_api: ^1.4.0 amplify_auth_cognito: ^1.4.0 amplify_push_notifications_pinpoint: ^1.4.0 amplify_analytics_pinpoint: ^1.4.0 intl: ^0.18.0 flutter_rating_bar: ^4.0.1 url_launcher: ^6.1.11 flutter_slidable: ^3.0.0 stream_transform: ^2.1.0 rxdart: ^0.27.7 ```
Equartey commented 1 year ago

@jamilsaadeh97 my apologies, I'm still having difficulties reproducing your issue.

Could you include a recording of the logs or a raw log dump from when you've observed the issue?

jamilsaadeh97 commented 1 year ago

That's really weird because I can reproduce it on different devices from the very simple code example above.

Since it's in release mode with ad hoc or Test Filght, is there a way to get the logs of AmplifyLogger?

jamilsaadeh97 commented 1 year ago

Also is it possible to send me a screen recording from your side? Just to compare the behavior of both.

Equartey commented 1 year ago

@jamilsaadeh97 apologies, I was testing on debug thinking you could repro it there. I swapped to release mode with ad hoc and could reproduce it very easily there.

I'll investigate further. Thank for your patience.

jamilsaadeh97 commented 1 year ago

Hi @Equartey , that's one step forward since I've been wrapping my head around it and trying to figure out why you were not able to reproduce it on your side.

More and more users are complaining about this issue so please let me know if I can assist you in any way to try to resolve this quickly.

Thank you!

Equartey commented 1 year ago

@jamilsaadeh97 still trying to identify a good solution on our end, but I think I found a work around for you to try in the meantime.

First, it appears the unsubscribe on AppLifecycleState.pause is not being called in all cases we need. I'm seeing consistent recovery when I add the AppLifecycleState.inactive to the disconnect case.

Second, we need to ensure when we call unsubscribe, we unsubscribe from all active subscriptions so the underlying web socket can close the connection. We either need to keep a list of active subscriptions or avoid tossing away subscription requests after we create them. For your example, code I chose to reuse the same request.

Please review the revisions I made to your sample code and tell me how it goes for you.

  @override
  void didChangeAppLifecycleState(AppLifecycleState state) {
    switch (state) {
      case AppLifecycleState.resumed:
        // Todo: fetch data for missed messages
        subscribe();
        break;
      case AppLifecycleState.paused:
      case AppLifecycleState.inactive:
        // unsubscribe anytime when the app is not in the foreground
        unsubscribe();
        break;
      default:
        break;
    }
  }

...

String _hubConnectionStatus = "";
final subscriptionRequest = ModelSubscriptions.onCreate(Item.classType); // moved out of subscribe function

  void subscribe() {
    final Stream<GraphQLResponse<Item>> operation = Amplify.API.subscribe(
      subscriptionRequest,
      onEstablished: () => safePrint('Subscription established'),
    );
    ....
  }
jamilsaadeh97 commented 1 year ago

I will try this out right now.

Second, we need to ensure when we call unsubscribe, we unsubscribe from all active subscriptions so the underlying web socket can close the connection

In my prod app, the stream subscriptions are added to a list and when a dispose happens, I loop over the list and cancel each one of them.

Also did it happen to you that every time you came back to the app from the background, if the hub sends a disconnected/connecting event and then "connected", data are received normally, but if you came back and it shows directly "connected" then this is where no more data is received?

Afterwards, if you wait 5 mins in the foreground, the hub sends a "disconnected" event with a Timeout Exception.

Equartey commented 1 year ago

Great, sounds like you're handling your streams correctly.

Also did it happen to you that every time you came back to the app from the background, if the hub sends a disconnected/connecting event and then "connected", data are received normally, but if you came back and it shows directly "connected" then this is where no more data is received?

Yes, that is what I was observing too. It's as if the connection wasn't disconnecting when the app went away when the phone is locked. I think there is a more graceful solution as my work around rather aggressively closes the subscription, but that will take more investigation.

Afterwards, if you wait 5 mins in the foreground, the hub sends a "disconnected" event with a Timeout Exception.

This is expected behavior, I didn't explicitly test it.

jamilsaadeh97 commented 1 year ago

Just did a lot of testing using your approach (unsubscribing on inactive and paused) and I think out of 40 times it got stuck 4 or 5 times. Better than before but like you said, there should be a more graceful solution especially that it might not be as performant in real life scenarios (received phone calls, switching quickly between apps, anything really).

suryaishnavi commented 1 year ago

DataStore Sync: Data Not Syncing to Cloud Automatically; Requires App Restart:

The issue below is occurring in release mode Android.

In 90% of cases, when the network is available, data is synchronized to the cloud. However, in certain situations, such as when the app resumes from a detached app lifecycle state and is unused for an extended period, the app model cloud subscriptions may go into an ideal state. At this time, data mutations are stored in the datastore outbox and do not synchronize to the cloud, even if the network is available.

I have created a singleton class that listens to DataStore HubEvents throughout the app. When I create or modify data, it gets stuck in the OutboxMutationEnqueuedState and doesn't get processed.

The below code is a singleton class to listen to datastore events:

class DataStoreEventHandler {
  StreamSubscription<DataStoreHubEvent>? _subscription;
  static final DataStoreEventHandler _instance =
      DataStoreEventHandler._internal();

  factory DataStoreEventHandler() => _instance;

  DataStoreEventHandler._internal();

  ///* dataStoreEvent
  final _dataStoreEvent = StreamController<DataStoreHubEvent>.broadcast();

  ///* networkEvent
  final _networkEvent = StreamController<NetworkStatusEvent>.broadcast();

  ///* syncQueriesEvent
  final _syncQueriesEvent =
      StreamController<SyncQueriesStartedEvent>.broadcast();

  ///* modelSyncEvent
  final _modelSyncEvent = StreamController<ModelSyncedEvent>.broadcast();

  ///! outboxMutationEnqueuedEvent
  final _outboxMutationEnqueuedEvent =
      StreamController<OutboxMutationEvent>.broadcast();

  ///! outbox status
  final _outboxStatusEvent = StreamController<OutboxStatusEvent>.broadcast();

  ///! outboxMutation processed
  final _outboxMutationProcessedEvent =
      StreamController<OutboxMutationEvent>.broadcast();

  ///! outboxMutation failed
  final _outboxMutationFailedEvent =
      StreamController<OutboxMutationEvent>.broadcast();

  void initialize() {
    _subscription = Amplify.Hub.listen(
      HubChannel.DataStore,
      (DataStoreHubEvent hubEvent) {
        _dataStoreEvent.add(hubEvent);
        switch (hubEvent.type) {
          case DataStoreHubEventType.networkStatus:
            final networkStatusEvent = hubEvent.payload as NetworkStatusEvent;
            _networkEvent.add(networkStatusEvent);
            break;
          case DataStoreHubEventType.subscriptionsEstablished:
            safePrint('****Subscriptions established****');
            break;
          case DataStoreHubEventType.syncQueriesStarted:
            final payload = hubEvent.payload as SyncQueriesStartedEvent;
            _syncQueriesEvent.add(payload);
            // final List<String> models = payload.models;
            // print('Sync queries started for models: $models');
            safePrint('Sync queries started');
            break;
          case DataStoreHubEventType.modelSynced:
            final payload = hubEvent.payload as ModelSyncedEvent;
            _modelSyncEvent.add(payload);
            // final String modelName = payload.modelName;
            // final bool isFullSync = payload.isFullSync;
            // final bool isDeltaSync = payload.isDeltaSync;
            // final int modelInstanceAdded = payload.added;
            // print('Model synced: ** $modelName **');
            // print('Is full sync: ** $isFullSync **');
            // print('Is delta sync: ** $isDeltaSync **');
            // print('Model instances added: ** $modelInstanceAdded **');
            safePrint('Model synced');
            break;
          case DataStoreHubEventType.syncQueriesReady:
            safePrint('Sync queries ready');
            break;
          case DataStoreHubEventType.ready:
            safePrint('DataStore is ready');
            break;
          case DataStoreHubEventType.outboxMutationEnqueued:
            final payload = hubEvent.payload as OutboxMutationEvent;
            _outboxMutationEnqueuedEvent.add(payload);
            safePrint('Outbox mutation enqueued');
            break;
          case DataStoreHubEventType.outboxMutationProcessed:
            final payload = hubEvent.payload as OutboxMutationEvent;
            _outboxMutationProcessedEvent.add(payload);
            safePrint('Outbox mutation processed');
            break;
          case DataStoreHubEventType.outboxMutationFailed:
            final payload = hubEvent.payload as OutboxMutationEvent;
            _outboxMutationFailedEvent.add(payload);
            safePrint('Outbox mutation failed');
            break;
          case DataStoreHubEventType.outboxStatus:
            final payload = hubEvent.payload as OutboxStatusEvent;
            _outboxStatusEvent.add(payload);
            safePrint('Outbox status');
            break;
          case DataStoreHubEventType.subscriptionDataProcessed:
            safePrint('Subscription data processed');
            break;
        }
      },
    );
  }

  Stream<NetworkStatusEvent> get networkEvent => _networkEvent.stream;
  Stream<DataStoreHubEvent> get dataStoreEvent => _dataStoreEvent.stream;
  Stream<SyncQueriesStartedEvent> get syncQueriesEvent =>
      _syncQueriesEvent.stream;
  Stream<ModelSyncedEvent> get modelSyncEvent => _modelSyncEvent.stream;
  Stream<OutboxMutationEvent> get outboxMutationEnqueuedEvent =>
      _outboxMutationEnqueuedEvent.stream;
  Stream<OutboxStatusEvent> get outboxStatusEvent => _outboxStatusEvent.stream;
  Stream<OutboxMutationEvent> get outboxMutationProcessedEvent =>
      _outboxMutationProcessedEvent.stream;
  Stream<OutboxMutationEvent> get outboxMutationFailedEvent =>
      _outboxMutationFailedEvent.stream;

  void dispose() {
    _subscription?.cancel();
    _networkEvent.close();
    _dataStoreEvent.close();
    _syncQueriesEvent.close();
    _modelSyncEvent.close();
    _outboxMutationEnqueuedEvent.close();
    _outboxStatusEvent.close();
    _outboxMutationProcessedEvent.close();
    _outboxMutationFailedEvent.close();
  }
}
Equartey commented 1 year ago

@suryaishnavi this issue is unrelated to the discussion here. DataStore uses a different implementation for the underlying subscription compared to the API category. Could you please open a new issue so we can track that separately?

Equartey commented 1 year ago

@jamilsaadeh97 we are currently investigating how we can best address this. I will provide updates when we have them. Thanks for bringing this to our attention.

jamilsaadeh97 commented 1 year ago

Hi @Equartey that's great! Thank you for your efforts.

suryaishnavi commented 1 year ago

Hi @Equartey,

Thank you for your response and suggestion. I understand that DataStore uses a different implementation for the underlying subscription compared to the API category. However, I believe this issue is related to the API plugin's functionality, which is essential for DataStore synchronization with the cloud.

The problem I'm encountering is that the API plugin fails to establish the necessary subscriptions to models, resulting in outbox mutations getting stuck and data not syncing from local storage to the cloud. I wanted to bring this to your attention in the context of DataStore's reliance on the API plugin.

Would you recommend any specific steps to address this issue or any additional information I should provide for a new issue?

Thank you for your assistance.

Equartey commented 1 year ago

@suryaishnavi I understand your concern. This is a separate issue.

DataStore relies on the Amplify-Android and Amplify-Swift API categories as native implementation. So any issues there will be independent of the Dart API category which is responsible for GraphQL subscriptions.

The GitHub issue template should be sufficient to collect information on the issue. Appreciate your help.

suryaishnavi commented 1 year ago

Hello @Equartey,

Thank you for your clarification and guidance. I will proceed by creating a new GitHub issue using the appropriate template, ensuring all relevant details are provided to highlight the issue effectively.

suryaishnavi commented 1 year ago

Hello @Equartey

Upon careful consideration, I have come to the conclusion that my issue is related to the utilization of a GraphQL schema within my app. Despite DataStore having its distinct API, its cloud synchronization functionality relies on the underlying API category. Hence, integrating the API plugin remains essential when working with DataStore (source: https://docs.amplify.aws/lib/datastore/sync/q/platform/flutter/#setup-cloud-sync).

I have encountered a similar issue within my app while listening to all DataStore events. The problem manifests when I swipe the app away, turn off my phone, then turn it back on, and subsequently return to the app. Data syncing halts at the "Outbox Mutation Processed" state, indicating a problem when interacting with the GraphQL API to send data for synchronization (resulting in a cessation of cloud syncing due to GraphQL API issues).

damir-fell commented 1 year ago

Hi @Equartey, Any news on this and possibly some ETA? :)

damir-fell commented 1 year ago

@Equartey @jamilsaadeh97 Any updates or have you identified a way to mitigate this issue? Any help is appreciated as we are facing this issue and when it occurs it completely breaks our app experience. Its urgent for us to get this resolved.

Can we check if the underlying websocket is disconnected? Can we force something ourselves?

In our case we get these errors alongside "failed: HttpException: Bad file descriptor" errors, not sure if this gives you any help. I see there is a long thread about this in the official Flutter repo: https://github.com/dart-lang/http/issues/197#issuecomment-1428812398

damir-fell commented 1 year ago

@Equartey I can add that this seems also to happen while app has been running for some time. It just happened for one of our users and looking at the session the app was active for 30 mins and suddenly websockets are broken.

Is the Appsync service breaking TCP connections randomly?

Can we please have some updates to this issue, last comment was 2 weeks ago and this issue is not recoverable without backgrounding/foregrounding app again or killing it.

adplant commented 11 months ago

Any update on this? I'm having similar issues and am definitely interested in a fix.

999Rub commented 11 months ago

@jamilsaadeh97 you need your stream to be subscribed while your app is in background or doing something else on the phone ? Or your need the stream only when the app is in foreground and the user is using it normally ? I've seen the workaround using the AppLifeCycleState. Have you tried to unsubscribe and then subscribe to your stream on app resume only ? Have you tried to stop and start datastore on app resume too ? Just trying to help, best regards.

jamilsaadeh97 commented 11 months ago

Hi @999Rub , I really appreciate your help. Currently I'm unsubscribing on paused and subscribing on resume. It got better results than any other approaches, but this is still far away from a clean workaround since I'm receiving daily errors in Crashlytics and users complaints.

Have you tried to stop and start datastore on app resume too ?

I'm only using the API category in my app.

@Equartey Do you have any updates on this matter?

Thank you!

damir-fell commented 11 months ago

@Equartey @Jordan-Nelson @khatruong2009 Does anybody have any information or update? I tagged you since I see you are assigned to recent issues. There are no feedbacks here.

What do you think is the problem, could we try a commit or branch or something that might fix the issue? To me this seems like a deep issue in SDK that all subscriptions are not closed causing the underlying TCP connection to always stay active, which gets freed by the OS and then we are stuck with a broken TCP connection that does not recover.

There are suggestions to use http_cupertino as a workaround, but I would like to have a root cause fix.

damir-fell commented 10 months ago

Will there be any updates in 2023?

NikaHsn commented 10 months ago

sorry that you are facing this issue, we are investigating it and will be sending updates here as we have more info.

khatruong2009 commented 10 months ago

Hi @jamilsaadeh97, I have been trying to reproduce this issue now but am unable to. Attached is a screen recording that replicates the video you sent. This is on my physical iPhone with the app in release mode with ad hoc. The same thing happens when I follow the reproduction steps by turning the phone off and back on again.

https://github.com/aws-amplify/amplify-flutter/assets/64438356/bf254187-6a58-4067-97b4-1cd1e88724b7

jamilsaadeh97 commented 10 months ago

Hi @khatruong2009 , that's weird, I can still reproduce it on my end. Most of the time the issue happens on locking the phone and opening the app again. It was also easily reproduced by @Equartey . Are you unsubscribing from tbe stream on any lifecycleState ?

khatruong2009 commented 10 months ago

Hi @jamilsaadeh97, I can reproduce the issue but after implementing the fix that @Equartey suggested, I am no longer able to reproduce the issue. I know that you said you had around 4-5 reproductions out of 40 tries and I have tried about 30 times and have been unable to reproduce it a single time after implementing the fix. I also spoke to @Equartey and he is no longer able to reproduce it now as well once the fix is implemented. Could you try reproducing it again with @Equartey's fix? It's possible something was updated on the iOS side that might have fixed this. Also, could you post your amplify_flutter and amplify_api versions that you have in your pubspec.lock file?

jamilsaadeh97 commented 10 months ago

I will try to reproduce it and send another screen recording in a bit @khatruong2009 . This is my pubspec.lock:

  amplify_flutter:
    dependency: "direct main"
    description:
      name: amplify_flutter
      sha256: "1210b57e81fd20ee06ee3feb62bcfb5a67225d1b76c91c2326156e8e8f8462a8"
      url: "https://pub.dev"
    source: hosted
    version: "1.6.0"

  amplify_api:
    dependency: "direct main"
    description:
      name: amplify_api
      sha256: "2b34a3fbbbe33f2027be11ff18070a4660295012beb5cd52a55a0ae71fa5d7f1"
      url: "https://pub.dev"
    source: hosted
    version: "1.6.1"
damir-fell commented 10 months ago

@khatruong2009 Just my 5 cents, you are not going to reproduce this problem running a single subscription on a emulator with one app.

You need to setup on a real device with lots of apps and use the phone as a regular user, put app in background, going in and out of other apps like youtube, tiktok etc so that system reclaims TCP connections and other things to make the error appear.

Running one subscription and putting app into background 5 seconds and back into foreground wont cut it, at least on our side here its not reproducible like that. It happens randomly and usually not right away.

I dont know if its possible for you but having a more "crowded" app and with extended usage it will definitely make the problem appear. Trying to put app in background, locking screen, switch to other apps for some time then back, or switch very fast back and forward and after some time it will occur. You should also set up a test with multiple different subscriptions

khatruong2009 commented 10 months ago

Hi @damir-fell, I have run this on my personal phone while running 10+ apps that use internet in the background while also using those apps. I still haven't been able to reproduce with extensive testing. Are you working together with @jamilsaadeh97? If so, is there any way to reproduce the issue consistently that you've found?

damir-fell commented 10 months ago

@khatruong2009 No, we are not working together, we are just experiencing the same issue separately.

Unfortunately I do not have a consistent way to reproduce it, I just get the same kinds of errors reported as @jamilsaadeh97 from our users of the app. We also cancel subscriptions and resume them on app lifecycle state changes.

What can I provide to you to help you reproduce the issue?

Equartey commented 9 months ago

Thanks everyone for the discussion and bringing this to our attention thus far.

I'm updating issue status to reflect we can no longer reproduce the problem.

While the issue may still be unresolved, we need more precise steps to reproduce it accurately and identify the root cause.

If anyone observes this issue and has concrete reproduction steps, please provide that information and we will resume our investigation.

bitsplash-andy commented 8 months ago

I have been frustrated by this not working as anticipated, but the advice here seems correct.

If you are experiencing the above issue and are not receiving disconnected event from the Amplify.Hub on app pause, you almost certainly have a stream subscription still open. One approach without putting stream subscriptions everywhere is to wrap the amplify library and force all socket streams to close on app pause. You will of course need to ensure subscriptions are re-initiated on app resume, but the following code will lift the issues out of the websocket layer at least.

import 'dart:async';

import 'package:amplify_flutter/amplify_flutter.dart';
import 'package:flutter/cupertino.dart';

class AmplifyAPIWrapper extends WidgetsBindingObserver
    implements APICategory {
  final List<StreamSubscription> _subscriptions = [];

  AmplifyAPIWrapper() {
    WidgetsBinding.instance.addObserver(this);
  }

  @override
  Future<void> addPlugin(
      APIPluginInterface plugin, {
        required AmplifyAuthProviderRepository authProviderRepo,
      }) {
    return Amplify.API.addPlugin(
      plugin,
      authProviderRepo: authProviderRepo,
    );
  }

  @override
  Category get category => Amplify.API.category;

  @override
  Set<Category> get categoryDependencies => Amplify.API.categoryDependencies;

  @override
  APIPluginInterface get defaultPlugin => Amplify.API.defaultPlugin;

  @override
  RestOperation delete(
      String path, {
        Map<String, String>? headers,
        HttpPayload? body,
        Map<String, String>? queryParameters,
        String? apiName,
      }) {
    return Amplify.API.delete(
      path,
      headers: headers,
      body: body,
      queryParameters: queryParameters,
      apiName: apiName,
    );
  }

  @override
  RestOperation get(
      String path, {
        Map<String, String>? headers,
        Map<String, String>? queryParameters,
        String? apiName,
      }) {
    return Amplify.API.get(
      path,
      headers: headers,
      queryParameters: queryParameters,
      apiName: apiName,
    );
  }

  @override
  RestOperation head(
      String path, {
        Map<String, String>? headers,
        Map<String, String>? queryParameters,
        String? apiName,
      }) {
    return Amplify.API.head(
      path,
      headers: headers,
      queryParameters: queryParameters,
      apiName: apiName,
    );
  }

  @override
  GraphQLOperation<T> mutate<T>({required GraphQLRequest<T> request}) {
    return Amplify.API.mutate(request: request);
  }

  @override
  RestOperation patch(
      String path, {
        Map<String, String>? headers,
        HttpPayload? body,
        Map<String, String>? queryParameters,
        String? apiName,
      }) {
    return Amplify.API.patch(
      path,
      headers: headers,
      body: body,
      queryParameters: queryParameters,
      apiName: apiName,
    );
  }

  @override
  List<APIPluginInterface> get plugins => Amplify.API.plugins;

  @override
  RestOperation post(
      String path, {
        Map<String, String>? headers,
        HttpPayload? body,
        Map<String, String>? queryParameters,
        String? apiName,
      }) {
    return Amplify.API.post(
      path,
      headers: headers,
      body: body,
      queryParameters: queryParameters,
      apiName: apiName,
    );
  }

  @override
  RestOperation put(
      String path, {
        Map<String, String>? headers,
        HttpPayload? body,
        Map<String, String>? queryParameters,
        String? apiName,
      }) {
    return Amplify.API.put(
      path,
      headers: headers,
      body: body,
      queryParameters: queryParameters,
      apiName: apiName,
    );
  }

  @override
  GraphQLOperation<T> query<T>({required GraphQLRequest<T> request}) {
    return Amplify.API.query(request: request);
  }

  @override
  Future<void> reset() {
    throw UnimplementedError();
  }

  @override
  Stream<GraphQLResponse<T>> subscribe<T>(
      GraphQLRequest<T> request, {
        void Function()? onEstablished,
      }) {
    final gqlStream = Amplify.API.subscribe(request, onEstablished: onEstablished);
    final newStream = StreamController<GraphQLResponse<T>>();
    final streamSubscription = gqlStream.listen((event) {
      newStream.add(event);
    });
    _subscriptions.add(streamSubscription);
    return newStream.stream;
  }

  @override
  Future<void> closeAllConnections() async {
    print("closing all connections for API");
    for (var subscription in _subscriptions) {
      await subscription.cancel();
    }
    _subscriptions.clear();
  }

  @override
  void didChangeAppLifecycleState(AppLifecycleState state) {
    super.didChangeAppLifecycleState(state);

    switch (state) {
      case AppLifecycleState.paused:
        closeAllConnections();
        break;
      case AppLifecycleState.resumed:
      case AppLifecycleState.detached:
      case AppLifecycleState.inactive:
      case AppLifecycleState.hidden:
    }
  }
}
Jordan-Nelson commented 8 months ago

Hello folks - While we were able to reproduce this at one point, we have not been able to reproduce this following those same reproduction steps since November. We were never able to determine a root cause so it was unclear if this was a bug in Amplify Flutter or an issue with the Flutter SDK, or even a bug in iOS.

It is possible that this has been resolved by a new version of Flutter/Dart or changes in new versions of iOS.

For anyone still facing this issue can you please upgrade to the latest version of Flutter and the latest version of all Amplify libraries and let us know if you are still able to reproduce it? If you are, please provide details on the device and OS where this is observed. Also please provide as detailed of reproduction steps as possible.