aws-amplify / amplify-js

A declarative JavaScript library for application development using cloud services.
https://docs.amplify.aws/lib/q/platform/js
Apache License 2.0
9.41k stars 2.11k forks source link

DataStore - DeltaSync Very Slow on SQLite #8699

Closed sacrampton closed 2 years ago

sacrampton commented 3 years ago

Before opening, please confirm:

JavaScript Framework

React Native

Amplify APIs

DataStore

Amplify Categories

api

Environment information

``` # Put output below this line npx: installed 1 in 1.481s System: OS: Linux 4.14 Amazon Linux 2 CPU: (2) x64 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz Memory: 6.75 GB / 7.79 GB Container: Yes Shell: 4.2.46 - /bin/bash Binaries: Node: 10.23.2 - ~/.nvm/versions/node/v10.23.2/bin/node npm: 6.14.10 - ~/.nvm/versions/node/v10.23.2/bin/npm npmGlobalPackages: @aws-amplify/cli: 5.2.0 cdk: 1.87.1 coffeescript: 2.5.1 esformatter: 0.11.3 js-beautify: 1.13.5 npm: 6.14.10 prettier: 2.2.1 typescript: 3.7.5 ```

Describe the bug

I am creating this issue as a separate GitHub issue - separate from #8405

https://github.com/aws-amplify/amplify-js/issues/8405#issuecomment-891634004

Hi @iartemiev - want to push further into the slowness we are seeing for DeltaSync

DataStore creates a separate table in DynamoDB to manage the DeltaSync called "AmplifyDataStore-ENV".

There are no indexes in this table - just the partition key and sort key - where the partition key is table/date and sort key is time/id/version.

Our database is multi-tenanted - and we deal with assets in industrial plants. So I could have hundreds of other users in other plants making massive amounts of changes. But I might not have any users working my plant. The DeltaSync as I see it is going to have to sort through everyone else's changes just to work out there are zero changes that are going to be applicable to me.

When we initially hydrate the cache we do a base query which uses GSI's to get a quick response.

At the moment I'm seeing DeltaSync take about the same amount of time as the full sync (20 minutes). Today I know I was doing a lot of bulk updating of data in a few different plants through our web back end. Not an unusually large workload. But I am concerned from the slowness I'm seeing in our database and what I see in the DeltaSync table for DynamoDB has me worried that this is not scalable for a multi-tenanted environment.

You've been really good at coming up with solutions to get us moving - hopefully someone else has already come up with a solution to make the DeltaSync run in seconds rather than 20+ minutes.

https://github.com/aws-amplify/amplify-js/issues/8405#issuecomment-891797469

@sacrampton, I think this behavior likely warrants a separate GitHub issue, unless this is somehow related to the on-device database on React Native specifically (AsyncStorage or SQLite).

To better understand what's going on, I have some follow up questions:

How many total records are in the delta sync table in DynamoDB at the time that you're seeing the 20 min delta sync time? How many of those records are being synced down to the app? Are you using DataStore.configure to change any of the sync-related settings (e.g., syncPageSize, fullSyncInterval, etc.)? If so, which settings are you using? Are you seeing roughly the same delta sync performance if you test this in a web app?

Expected behavior

DeltaSync processes very quickly - seconds, not minutes

Reproduction steps

DataStore.start

Code Snippet

// Put your code below this line.

Log output

``` // Put your logs below this line ```

aws-exports.js

No response

Manual configuration

No response

Additional configuration

No response

Mobile Device

No response

Mobile Operating System

No response

Mobile Browser

No response

Mobile Browser Version

No response

Additional information and screenshots

No response

sacrampton commented 3 years ago

Hi @iartemiev - I've created the new GitHub issue as requested.

I checked the AmplifyDataStore-ENV table and there are currently ZERO records in this table. So I thought the DeltaSync should be almost instantaneous - it still took 20 minutes to process everything. This is not making sense to me why that is the case.

When there were a large amount of records in the AmplifyDataStore table there were zero that were relevant to my login and zero were downloaded to the device.

I have not been able to test anything in a web app as we discontinued development of that and don't know of an easy way to test that.

I don't believe we are changing any sync related settings.

Another observation - I am testing this with a large data set. If we log in with a smaller data set (development environment) then we see base sync at 5 minutes and delta sync at 1 minute - so it is noticeably different on smaller data set - but at 1 minute to download zero changes that is still too long.

iartemiev commented 3 years ago

Thank you for opening the issue, @sacrampton. It sounds to me like DataStore is performing another base sync instead of a delta. Let me try to reproduce this behavior and I'll know for sure. Should be an easy fix if that's the case.

iartemiev commented 3 years ago

I did some testing today and did not find any issues with the SQLite adapter's behavior here, i.e., if I reload the app inside of the full sync interval (by default this is 1 day), DataStore will perform a delta sync. If it's outside of the full sync interval, it performs a base sync.

Just to give you some context on how delta sync works. When DataStore starts, it retrieves the last sync timestamp for each model, checks if that timestamp is inside the full sync interval, and if it is, passes that timestamp in the GraphQL query network request (lastSync variable in the request payload). The AppSync request resolver then passes along that lastSync argument to DynamoDB, which in turn fetches any matching records from the delta table based on that timestamp. The records are then returned to the client-side in the network request. All of that is to say that DataStore does not do anything special on its end other than passing along that lastSync attribute in the network request. All of the "heavy lifting" is done in AppSync/DynamoDB. DataStore will still send a separate sync query request to AppSync for each model you have in your schema, just like it does with a base sync. Depending on network connectivity, it may take a couple of hundred milliseconds per request (i.e., per model), but taking 20 minutes to retrieve no results doesn't add up for me.

I'll do some more targeting testing on my end, but in the meantime, I have a few more follow-up questions:

  1. Are you testing delta sync inside the full sync interval? I.e., within a day since you performed the base sync?
  2. Have you attempted testing sync query performance via the AppSync console? You would want to have a lot of recent changes, that is, the delta sync table should be populated with as many records as you would expect a sensible max number of changes to be for your use case.
sacrampton commented 3 years ago

Hi @iartemiev - firstly, everything we are doing is within the full sync interval.

We've run a bunch more tests in debug and can report as follows.

So when we first log in with an empty database we are downloading these quantities of records from these 20 models

Total time taken in sync : approx 19 minutes. Majority of time is spent in these 3 models.

I think there is an issue with the base sync truncating if I do a count of records in the database (we use ElasticSearch for that) I have the following quantities. The fact that the photo/assetVisit models are nice round numbers is a worry:

After the database is populated (ie. we don't call Datastore.clear() ) we restart the app ( Datastore.start() ). We do this within minutes of the initial sync so its still well within the full sync window.

What we observe is that all 19 Models except Asset get a delta sync completed in a few seconds, while Asset model gets a full sync, again taking around 14 minutes again and showing 0 new records and 30996 updated records.

Our use case has the customer at a site sharing devices with staff as shifts change - so we want to swap users (ie. don't clear the database) rather than log out. So if we swap user and do a delta sync we get the following:

Then if we swap back to the original user we see the following

If we keep swapping users back and forth the delta sync eventually gets to a consistent 30 seconds.

Have also been noticing that in many cases records are being updated during the delta sync where there has not actually been an update. And the number of records updates is equal to the number of records originally synced.

sacrampton commented 3 years ago

Hi @iartemiev - separately to this I still think the issue of multi-tenanted delta sync is an issue - if there are 1000 customers that have made 1000 changes in the last hour that is a million changes to sort through. If I have another customer that has made zero changes then their users have to sort through the million changes to understand that there are zero changes that impact them.

sacrampton commented 3 years ago

Hi @iartemiev - changed max records from 50K to 100K records and this has resolved.

Still have concerns for multi tennant at scale, but immediate issue resolved

sacrampton commented 3 years ago

Hi @iartemiev - looks like I spoke too soon - the testing we did was on a virtual device on a development environment. When we put it on a real device with production database the problem did not resolve by increasing the number of max records.

sacrampton commented 3 years ago

Hi @iartemiev - seems like there is some issue with maxRecordsToSync. When we had the limit set to 50K it was downloading 61,907 (see above) - but there are actually 88,899 records in the data set. Changing limit to 100K is not seeming to make any change. Seems delta sync issues are related to entire data set not being initially downloaded

iartemiev commented 2 years ago

@sacrampton, I believe you were able to resolve this issue by increasing the DeltaTTL for certain tables. If so, are you fine with us closing this issue? Or is there more to this that has yet to be addressed?

sacrampton commented 2 years ago

Thanks @iartemiev - yes. For tables we were not syncing to data store I put the DeltaTTL to 1 minute and for tables we are syncing to data store I put to 720 minutes (12 hours). This has made an immediate and dramatic impact for our users. With the 30 minute DeltaTTL it was essentially syncing all the time and after 5-6 hours of constant usage the memory would become so bogged down that you had to kill the app and restart it. That has gone away now.

nubpro commented 2 years ago

Thanks @iartemiev - yes. For tables we were not syncing to data store I put the DeltaTTL to 1 minute and for tables we are syncing to data store I put to 720 minutes (12 hours). This has made an immediate and dramatic impact for our users. With the 30 minute DeltaTTL it was essentially syncing all the time and after 5-6 hours of constant usage the memory would become so bogged down that you had to kill the app and restart it. That has gone away now.

This sounds like a hacky workaround and seems to me, will likely come back and bite u

sacrampton commented 2 years ago

Hi @nubpro - always open to any suggestions that help us move forward. So if you have any ideas on better ways to proceed please share. Welcome any and all suggestions. Thanks.

iartemiev commented 2 years ago

@nubpro, can you elaborate on why increasing the DeltaSyncTableTTL is “hacky” or a “workaround”? This is a well-documented property of sync-enabled AppSync APIs. We default it to 30 minutes because we think this is a sensible default for most customers, but a longer TTL is certainly valid for certain use cases, such as @sacrampton’s.

nubpro commented 2 years ago

My bad I have read incorrectly. I'd thought if u increase the deltaTTL to a large number, you are simply delaying the memory usage issue to a later time.

Instead: Increasing the deltaTTL, will force the client to sync using the delta table instead of a full base sync.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there hasn't been any recent activity after it was closed. Please open a new issue for related bugs.

Looking for a help forum? We recommend joining the Amplify Community Discord server *-help channels or Discussions for those types of questions.