aws-amplify / amplify-js

A declarative JavaScript library for application development using cloud services.
https://docs.amplify.aws/lib/q/platform/js
Apache License 2.0
9.42k stars 2.11k forks source link

DataStore Query Performance Very Slow #8405

Closed sacrampton closed 3 years ago

sacrampton commented 3 years ago

Before opening, please confirm:

JavaScript Framework

React Native

Amplify APIs

DataStore

Amplify Categories

api

Environment information

``` # Put output below this line ```

Describe the bug

We are porting our AppSync/GraphQL based app to DataStore and what we are finding is that queries on the cached database are very slow. We are also finding that pagination of the queries makes no difference to the performance of the queries.

There are many tables with the largest table having about 8,000 records and most other tables having less than 1000 records. There are only 5 tables/models that have data.

Whether we are connected to the network or working offline it makes no difference to the performance.

In issue #6994 it talks of a first time slowing in performance, but we see consistently slow performance.

I have recorded a video of the performance we are seeing - both online and offline - to give you an idea of the problem we are facing.

https://user-images.githubusercontent.com/6362888/120909860-f74b7180-c6bc-11eb-97dc-300f8d19ab4d.mp4

Expected behavior

The database is local so we should be seeing almost instant responses to our queries. The performance of our existing AppSync GraphQL cache gives us almost instant responses so we expect the same or better performance than the existing AppSync GraphQL database.

Reproduction steps

Not sure how to replicate this as the database I have has 48 models and 200 GSI's and takes about 6 hours of incremental (manual) pushes to deploy into a new environment.

Code Snippet

// Put your code below this line.

Log output

``` // Put your logs below this line ```

aws-exports.js

No response

Manual configuration

No response

Additional configuration

No response

Mobile Device

iPhone 7

Mobile Operating System

14.5

Mobile Browser

Not relevant

Mobile Browser Version

No response

Additional information and screenshots

No response

nubpro commented 3 years ago

I closed my original issue as I could no longer replicate the problem anymore

Do u happen to be using the Storage module in your app?

sacrampton commented 3 years ago

Hi @nubpro - we are using the Storage module in the app for upload of photographs.

sacrampton commented 3 years ago

New Video - here is exactly the same application - with a few hundred records loaded versus over 7,000 in the previous video.

Seems loading more than a trivial amount of data is grinding it to a halt. What you see below is my expectation for what we should be getting with a larger data set - even 100K records.

https://user-images.githubusercontent.com/6362888/120977328-8e482480-c7b6-11eb-9c10-61daf9a3076a.mp4

sacrampton commented 3 years ago

Hi @manueliglesias / @iartemiev

Is there anything I can provide you to move this along? Its on a critical time line for us to get this resolved.

Thanks

Stephen

iartemiev commented 3 years ago

@sacrampton is the performance comparable between Simulator and physical devices on both iOS and Android? Or is one worse than the other? Does query performance degrade gradually as the number of records increases? Or does it tend to be acceptable up to a certain point and then rapidly drops?

Since you said only 5 of your models have data in them, I'll attempt to reproduce this with 5 models and 10k+ records in each. Will let you know what I find.

iartemiev commented 3 years ago

Also, which version of aws-amplify are you currently using?

sacrampton commented 3 years ago

Hi @iartemiev

I went through the list of models and data counts in more detail. The difference between the data in each model between the first and second videos is shown here.

Thanks...

Data By Model

Name High Count Low Count
Asset 7,409 359
AssetVisit 1,127 100
Photo 4,667 352
ClientPickList 1,233 1,233
FailureMode 25 25
FieldList 2,540 2,540
FieldSet 147 147
GlobalPickList 2,970 2,970
OperatingUnit 1 1
Package 34 34
PhotoButton 13 13
Plant 1 1
Risk 37 37
Scale 53 53
Uom 1,099 1,099
User 1 1
  21,357 8,965
sacrampton commented 3 years ago

Using aws-amplify version = 3.3.27

sacrampton commented 3 years ago

Hi @iartemiev - I had another thought on this. On the Asset object I have potentially 400+ attributes, and potentially similar on the AssetVisit object. In DynamoDB there would rarely be more than 30 attributes on any individual record so the size is quite small. And in GraphQL we just download the attributes we need for a specific app so we are never trying to download 400+ attributes.

But I'm starting to think that DataStore might is creating 400+ attributes on every single record on the mobile. In my case they would be hundreds of empty attributes taking up space. Is that what might be going on? Can you test by creating a couple of models with hundreds of empty attributes and see if that is the issue?

iartemiev commented 3 years ago

Thanks for the additional info. Are you returning all of the records for those models when you're querying Asset, AssetVisit, etc.? Or are you using predicates to return a subset?

sacrampton commented 3 years ago

Hi @iartemiev - sorry I'm probably not understanding the question. Our app is multi-tenanted. So we have a base query to return only the data that belongs to that client. But in our existing GraphQL app we would only return the specific attributes from say the Asset that the App needs (say 30 attributes and not 400+ attributes).

Also, I notice in the GraphQL when it is a BELONGS_TO association that the GraphQL query is returning all the associated data and I guess it is storing that against the record. For example, and Asset in most cases has a ParentAsset - so it seems that not only is the Asset getting all 400+ blank attributes, it is storing the 400+ blank attributes of the ParentAsset. The Asset has a lot of HAS_MANY connections, so each of those objects is also getting the 400+ blank attributes of the connected Asset in its GraphQL query.

iartemiev commented 3 years ago

I meant when you're calling DataStore.query, are you passing it a predicate, e.g., DataStore.query(Asset, c => c.someField('eq', 'someValue') or perhaps using pagination or are you retrieving all of the records from the local store by calling DataStore.query(Asset) (no predicates or pagination)?

In response to your question about fields with empty values: all of the model fields in your schema will be part of the selection sets for the GraphQL operations that populate the local store. Any empty values will come back as null from AppSync and will be persisted as null in the local store.

For example, if the records have empty values in fields field1-field5.

type Post @model {
  id: ID!
  title: String!
  field1: String
  field2: String
  field3: String
  field4: String
  field5: String
}

They would be stored as:

{
  "id": "id1",
  "title": "someTitle",
  "field1": null,
  "field2": null,
  "field3": null,
  "field4": null,
  "field5": null
}

Do you know which fields your app will need ahead of time? Or is that determined programmatically? In the former case, you could modify your schema.js and remove the fields that you don't want to be synced down.

sacrampton commented 3 years ago

Thanks @iartemiev. I have a further question related objects - whether the data from the related object is embedded in the original object inside DataStore. The queries tend to suggest that might be the case.

So for example, I might have an optional connection to a parent Post

type Post @model {
  id: ID!
  parentPost: Post @connection (fields: "postParentPostId")
  postParentPostId: ID
  title: String!
  field1: String
  field2: String
  field3: String
  field4: String
  field5: String
}

I am wondering if the Parent record is embedded in the current record like something like this.

{
  "id": "id1",
  "title": "someTitle",
  "postParentPostId" : "id2",
  "field1": null,
  "field2": null,
  "field3": null,
  "field4": null,
  "field5": null
  "parentPost" {
     "id": "id2",
     "title": "someTitle",
     "postParentPostId" : "id3",
     "field1": null,
     "field2": null,
     "field3": null,
     "field4": null,
     "field5": null
      }
}

Also, editing the schema.js is fine - I can do that. Do I need to edit any of the other files (ie. index.d.ts, etc.)

And I guess I can have a different schema.js for each application I have so each application can get to just the attributes it needs.

sacrampton commented 3 years ago

Hi @iartemiev - to answer your question "...I meant when you're calling DataStore.query, are you passing it a predicate, e.g., DataStore.query(Asset, c => c.someField('eq', 'someValue') or perhaps using pagination or are you retrieving all of the records from the local store by calling DataStore.query(Asset) (no predicates or pagination)?..."

We are doing the base query with a predicate as you describe and we are using pagination (500 records per cycle) because 1000 causes a resource exceeded error.

sacrampton commented 3 years ago

Hi @iartemiev - if we expand your type Post model above and have 100 x "field01": null fields and 2 fields as you show. In DynamoDB you would just have the id/title fields and this would be 32b in size. But if I look at the size of adding 100 null fields it swells to 1,532b. On a mobile where we are trying to save memory then it seems counterintuitive to be storing null fields. Is there any opportunity to store like DynamoDB? I guess not since I think you are using SQLite..

sacrampton commented 3 years ago

Hi @iartemiev - Some more results of testing with the original data set of 21,357 records detailed above. Querying the asset list before was taking 12 seconds. I have created a new schema.js that reduced the number of attributes on the records down significantly for the 3 models below.

  Original New
Asset 371 121
AssetVisit 352 43
Photo 112 60

The result was that by reducing the number of attributes on the records as shown above halves the response time (6 seconds to retrieve the assets list versus 12 seconds with the the previous one. To confirm, 100% of the attributes I removed were null. The null attributes are used by other (backend) parts of the solution - but not required in mobile app.

I think this shows that syncing null attributes to a record can dramatically impact performance. Given that there is a direct connection between storing null attributes and performance it follows that if we can get rid of all null attributes (like DynamoDB does) then we are going to be maximizing performance. The only way I can think of to get rid of all null values at all times is if you were able to use some sort of a NoSQL repository on the device that only stores non-null attributes, thus optimizing space and performance?

The beauty of GraphQL and DynamoDB is that I can create a comprehensive schema, but only store minimal data - so I have huge flexibility with economy and performance. But if we are storing a null value for every single possible attribute that takes away the flexibility.

sacrampton commented 3 years ago

Hi @iartemiev - another test result to share with you. Used another larger dataset for testing. This data set had these quantities in these 3 models. All the other models are approximately the same quantities - they don't vary by project as they are essentially reference data objects.

  Records
Asset 30,998
AssetVisit 9,315
Photo 25,776

I tried with the Original schema.js and the New schema.js defined above.

Note that when we query a model the response is irrelevant to the amount of data we are retrieving. From the 30K assets I could do a query that retrieves 1 record and it will take 1 minute and 4 seconds. I could conduct another query that returns 113 records and it take 1 minute and 4 seconds.

sacrampton commented 3 years ago

Hi @iartemiev - one final update from my testing. On the desktop you ship a storage adapter for IndexedDB which is a NoSQL database. So we did a test on the desktop with the large 30K+ Asset data set described above using the full schema.

It took about 11 minutes for the full dataset to download - but after that a query of the Asset table took 2 seconds. This compares with 64 seconds on Mobile App (React Native).

iartemiev commented 3 years ago

I'll try to respond to the questions/points in order

I have a further question related objects - whether the data from the related object is embedded in the original object inside DataStore.

For Has One and Belongs To relationships we resolve the connection at run time when you call DataStore.query on that model, but they are not saved that way to the local store.

We are doing the base query with a predicate as you describe and we are using pagination (500 records per cycle) because 1000 causes a resource exceeded error.

By "base query" are you referring to the initial base sync query between AppSync and the local store? Or calling DataStore.query to retrieve records from the local store?

On a mobile where we are trying to save memory then it seems counterintuitive to be storing null fields. Is there any opportunity to store like DynamoDB? I guess not since I think you are using SQLite..

That's not possible at the moment, but it's certainly something we will explore in the future. On React Native, DataStore utilizes Async Storage for the local store implementation. Async Storage in turn uses different underlying persistence layers depending on the platform. On iOS it uses a serialized dictionary/key-value store, not SQLite.

Note that when we query a model the response is irrelevant to the amount of data we are retrieving. From the 30K assets I could do a query that retrieves 1 record and it will take 1 minute and 4 seconds. I could conduct another query that returns 113 records and it take 1 minute and 4 seconds.

Could you please share the query you're using here? Specifically, what predicate are you passing?

sacrampton commented 3 years ago

Hi @iartemiev - thanks for answering my long chain of rambling "input" / observations.

type Asset @model
  @key(name: "byAssetPackageId", fields: ["assetPackageId"]) 
  @key(name: "byAssetPlantId", fields: ["assetPlantId"]) 
{
  id: ID!
  name: String!
  plant: Plant @connection (fields: ["assetPlantId"])
  assetPlantId: ID!
  package: Package @connection (fields: ["assetPackageId"])
  assetPackageId: ID
  auditCompleted: Boolean
}

type Plant @model
{
  id: ID!
  name: String!   
}

type Package @model
{
  id: ID!
  name: String!   
}

The base query would be to download all assets in a Plant

query baseQuery{
  syncAssets(filter:{
    assetPlantId:{eq:"xxxxxxxxxxxx"}
  }){items{id _version}}
}

Then in DataStore with the 30K assets we do a query to find Assets in a package that have not yet been audited. This query takes 1 minute 4 seconds to return the 2 records. If we remove the limit it returns 113 records and takes the same time.

const data = await DataStore.query(
 Asset,
 (asset) =>
 asset
 .assetPackageId('eq', userData.package.id)
 .auditCompleted('eq', false),
 {
 page: 0,
 limit: 2,
 },
 );
iartemiev commented 3 years ago

Thank you, that is helpful. We'll be doing additional library benchmarking today in order to pinpoint the issue and will provide an update with our findings.

iartemiev commented 3 years ago

Quick update: I have been able to repro performance issues similar to those described above when using a model with 10 fields containing values + 100 fields with null values and > 30k records in the local store. Will be investigating causes and potential solutions.

sacrampton commented 3 years ago

Hi @iartemiev - that is fantastic news that you were able to reproduce. Hopefully you can also reproduce it working perfectly with IndexedDB/Browser too. Await your feedback.

iartemiev commented 3 years ago

We've pinpointed the bottleneck to the multiGet method in Async Storage, it seems that we're brushing up against that library's performance limitations here.

We'll be brainstorming what improvements we could make in DataStore in order to mitigate this as much as possible. For your use case specifically, even with the reduced number of fields, are a considerable number of values in your records still null? If so, we have a couple of ideas on how we may be able to optimize for this.

As an aside, when you're seeing this poor performance do you have the React Native debugger enabled on your app? We're seeing significantly better performance (still suboptimal, but 2-3x faster) with the debugger disabled, so just want to point that out as well.

sacrampton commented 3 years ago

@iartemiev - looks like good progress.

Ultimately, I'd like to retain access to all fields (like we do with IndexedDB) - but for now we can run with the reduced schema. And even in the reduced schema I would estimate that 80% of those fields are null in the dataset.

I don't believe we have debugger enabled - but I will check.

sacrampton commented 3 years ago

@iartemiev - checked performance with and without debugger enabled. The 64 second response was with React Native Debugger enabled. With Debugger switched off the response is 22 seconds. As you said - its about a 3X improvement.

Worth remembering that 22 seconds was with the stripped down schema and is still 11X slower than the Browser/IndexedDB running with the full schema. Good to know we'll see that sort of improvement when we go to production, but 22 seconds with cut-down schema still isn't a solution.

Thanks for working this.

iartemiev commented 3 years ago

To level-set on expectations a bit, I don't think we'll be able to get the same performance out of AsyncStorage as we do from IndexedDB, no matter what optimizations we add. The former is an intrinsically slower persistence layer when dealing with large amounts of data. I'm hoping we can see another 2-3x improvement, but 10x is quite unlikely. In order to achieve an order of magnitude improvement, we'll probably need to leverage a faster local database for React Native. We'll certainly explore that option, but it's not something we'd be able to release in a matter of a day or two.

sacrampton commented 3 years ago

Hi @iartemiev - just checking back in to see whether you have made any progress on this.

iartemiev commented 3 years ago

Hey @sacrampton, we've identified the optimization we're planning on implementing and it's what you suggested above, i.e., not persisting null values to the local databases. We're testing this solution now to make sure that it doesn't have any unwanted side effects.

If you would like to test the effect this change will have on your queries before we release the fix, you can make the following temporary change in your Amplify dependencies manually:

Note you'll want to delete your app and re-install it after you make this change, i.e., re-sync the local database in order for the change to take effect

  1. Open ./node_modules/@aws-amplify/datastore/lib-esm/datastore/datastore.js

  2. Change this code (should start at line 279)

    var initializeInstance = function (init, modelDefinition, draft) {
    var modelValidator = validateModelFields(modelDefinition);
    Object.entries(init).forEach(function (_a) {
        var _b = __read(_a, 2), k = _b[0], v = _b[1];
        modelValidator(k, v);
        draft[k] = v;
    });
    };

    To:

    var initializeInstance = function (init, modelDefinition, draft) {
    var modelValidator = validateModelFields(modelDefinition);
    Object.entries(init).forEach(function (_a) {
        var _b = __read(_a, 2), k = _b[0], v = _b[1];
        modelValidator(k, v);
        if (v !== null) {
            draft[k] = v;
        }
    });
    };
  3. If your app happens to be using CJS modules, instead of ESM, you'll want to change the same code in ./node_modules/@aws-amplify/datastore/lib/datastore/datastore.js instead (lib subdirectory as opposed to lib-esm)

sacrampton commented 3 years ago

Hi @iartemiev - that sounds great - will get testing immediately and let you know how we go.

I just created another issue (#8450) also dealing with null values and subscriptions. Will the changes you made here deal with that?

sacrampton commented 3 years ago

Hi @iartemiev - we did the testing and can indeed see that the null values are removed from the data. However, it didn't change the performance we were seeing. Namely 64 seconds in debug and 24 seconds without debug.

iartemiev commented 3 years ago

Hmm that's odd, as we're seeing a roughly 2-2.5x improvement with the above change. Did you reinstall the app and re-sync the records after changing the code?

If so, could you please verify that the fields containing null values are definitely no longer getting persisted to AsyncStorage?

You can do that by querying AsyncStorage directly (without DataStore):

import AsyncStorage from '@react-native-async-storage/async-storage';

async function getFromAsyncStorage() {
  const model = 'Asset';

  const key = (await AsyncStorage.getAllKeys()).find(key =>
      key.includes(`@AmplifyDatastore::user_${model}`)
  );

  console.log(await AsyncStorage.getItem(key));
}
sacrampton commented 3 years ago

Hi @iartemiev - will check this out and confirm back with you asap.

If you have a chance, can you comment on this somewhat related issue of nulls in DataStore (#8450)

sacrampton commented 3 years ago

Hi @iartemiev - have double checked everything you said and the result is still the same.

Here is a screen shot of the data we see on the record after applying your fix - all the null values are removed. newDataStore

When we try the code you list above we are getting this - JSON value '' of type NSNull cannot be converted to NSString

[AsyncStorage] Using undefined type for key is not supported. This can lead to unexpected behavior/errors. Use string instead. Key passed: undefined

iartemiev commented 3 years ago

If the screenshot above is the result of a DataStore.query, that doesn't tell us definitively whether the nulls are being persisted or not to AsyncStorage, just that they're being omitted when the model class is instantiated by DataStore.

If the code I provided didn't work, you can log out all the keys with console.log(await AsyncStorage.getAllKeys()), then pass one of those keys to await AsyncStorage.getItem("some-key") and log that to the console.

It may be easier to set AsyncStorage to a global and then interact with it in real time via the RN debugger console.

In your app:

import AsyncStorage from '@react-native-async-storage/async-storage';
global.AsyncStorage = AsyncStorage;

Then in the debug console you can run e.g.,

await AsyncStorage.getAllKeys();

// pick one of the keys that starts with "@AmplifyDatastore::user_{your model}"
// call getItem on that key

await AsyncStorage.getItem("@AmplifyDatastore::user_...")
sacrampton commented 3 years ago

Hi @iartemiev - the team has been working to get this done, but not getting much progress.

MicrosoftTeams-image (1)

"I am only getting keys in which i store the data using - awaitAsyncStorage.getAllKeys()"

"I am not getting any key like - @AmplifyDatastore::user_Asset"

That was checked in the React Native debugger console.

iartemiev commented 3 years ago

Ah, I just realized that you're on an older version of aws-amplify that still uses the AsyncStorage module that's bundled with React Native, as opposed to the external module. You should be able to query it by importing from RN:

import { AsyncStorage } from 'react-native'

You should then see the DataStore-specific values in there using the code I provided above

iartemiev commented 3 years ago

(Apologies, finger slipped and I accidentally clicked "Close with comment" instead of "Comment". Re-opened the issue)

sacrampton commented 3 years ago

Hi @iartemiev - the team has checked using await AsyncStorage.getItem(key).

The code used before looks like this codeBefore

This results that come are shown below

resultBefore

After applying null validation we are getting these field in asset record. Here is the code after. codeAfter

And this is the data we are seeing

resultAfter

So it is clear that the null fields are gone after updating the code in aws-amplify module - but the speed doesn't change.

sacrampton commented 3 years ago

Hi @iartemiev - to rule out any related issues we upgraded aws-amplify to 4.1.1 - still same performance issue

sacrampton commented 3 years ago

Hi @iartemiev - just back tracking for a second. I significantly reduced number of attributes on 3 tables and assumed it was removal of all those null values which made the difference. Maybe it was the fact of reducing attributes on a table?

iartemiev commented 3 years ago

Reducing the amount of data stored in AsyncStorage will increase query performance. Whether that’s done through explicitly removing fields or by not persisting null values, the effect should be roughly the same.

iartemiev commented 3 years ago

Hey @sacrampton, wanted to share an update with you: we've decided to implement a new storage adapter option for React Native CLI apps that utilizes SQLite as the local persistence layer. We saw massive performance improvements when querying with predicates, sorting, and/or pagination, as well as processing the base/delta sync queries compared to the performance of AsyncStorage.

We still have some testing to do before we're ready to release this, but it's about 90% implementation complete as of right now. Once we're satisfied with the stability of this adapter, we'll let you know how you can utilize it in your app.

As an aside, you mentioned above that you're on version 3.3.27 of Amplify. Is there something in particular that's keeping you from upgrading to the latest version?

sacrampton commented 3 years ago

That is awesome @iartemiev - I had the guys upgrade to 4.1.1 - there was nothing keeping us from upgrading. Look forward to hearing more.

iartemiev commented 3 years ago

Hey @sacrampton, we're finishing up our testing of the SQLite adapter.

Everything is looking good so far, the only feature we haven't had a chance to implement is cascading delete functionality on relations. Is this a vital feature for your use case? Or would it be acceptable to omit it for now but add it in the near future? If we omit it, you would need to manually delete related entities via a separate call to DataStore.delete.

sacrampton commented 3 years ago

Hi @iartemiev - sounds great. Cascading deletes don't matter to me at this time.

iartemiev commented 3 years ago

Update: ran into a hiccup when testing many-to-many relations. Got a solution in mind, but will need just a bit more time to implement it and test. I will provide another update tomorrow

sacrampton commented 3 years ago

Hi @iartemiev - appreciate you keeping me updated. Thanks.

iartemiev commented 3 years ago

Hello again, @sacrampton! We’ve released a special version of AmplifyJS (aws-amplify@rn-sqlite) containing the new SQLite adapter. We’re still considering this to be an experimental feature, hence the special tag/version.

We’ve tested this adapter thoroughly on our end with up to 350k records. Querying on a predicate that returns a subset of data or using pagination and/or sorting is blazing fast (a fraction of a second in our testing). If you want to return the entire contents of a large table (over 10k records), i.e., querying without a predicate, that still takes a while (10+ seconds), but that is expected.

There is currently a bit of occasional funky behavior and console warnings when deleting relational records. This is because cascading deletes are not yet implemented. You would need to manually delete from the last descendant up to avoid warnings. During my testing, these did not break the application. We will be adding cascading delete functionality shortly to bring this to feature parity with the other adapters.

Please let us know in this thread if you run into issues, and we will get them resolved ASAP. We will include this adapter in the stable release soon but wanted to give you a chance to test and benchmark early to ensure that your performance needs are met.

Here are the installation instructions:

(note: AsyncStorage is still a required dep, as other parts of Amplify continue to rely on it)

yarn add aws-amplify@rn-sqlite amazon-cognito-identity-js@rn-sqlite aws-amplify-react-native@rn-sqlite react-native-sqlite-storage @react-native-async-storage/async-storage

# to ensure we're not left with any of the previous versions
rm -rf node_modules yarn.lock package-lock.json
yarn

npx pod-install

In order to restore the current/async-storage version:

yarn remove react-native-sqlite-storage
yarn add aws-amplify@latest amazon-cognito-identity-js@latest aws-amplify-react-native@latest

# to ensure we're not left with any of the previous versions
rm -rf node_modules yarn.lock package-lock.json
yarn

npx pod-install
sacrampton commented 3 years ago

Much appreciated @iartemiev - will start building asap and let you know our results. Thanks again