aws-amplify / amplify-js

A declarative JavaScript library for application development using cloud services.
https://docs.amplify.aws/lib/q/platform/js
Apache License 2.0
9.43k stars 2.13k forks source link

DataStore Query Performance Very Slow #8405

Closed sacrampton closed 3 years ago

sacrampton commented 3 years ago

Before opening, please confirm:

JavaScript Framework

React Native

Amplify APIs

DataStore

Amplify Categories

api

Environment information

``` # Put output below this line ```

Describe the bug

We are porting our AppSync/GraphQL based app to DataStore and what we are finding is that queries on the cached database are very slow. We are also finding that pagination of the queries makes no difference to the performance of the queries.

There are many tables with the largest table having about 8,000 records and most other tables having less than 1000 records. There are only 5 tables/models that have data.

Whether we are connected to the network or working offline it makes no difference to the performance.

In issue #6994 it talks of a first time slowing in performance, but we see consistently slow performance.

I have recorded a video of the performance we are seeing - both online and offline - to give you an idea of the problem we are facing.

https://user-images.githubusercontent.com/6362888/120909860-f74b7180-c6bc-11eb-97dc-300f8d19ab4d.mp4

Expected behavior

The database is local so we should be seeing almost instant responses to our queries. The performance of our existing AppSync GraphQL cache gives us almost instant responses so we expect the same or better performance than the existing AppSync GraphQL database.

Reproduction steps

Not sure how to replicate this as the database I have has 48 models and 200 GSI's and takes about 6 hours of incremental (manual) pushes to deploy into a new environment.

Code Snippet

// Put your code below this line.

Log output

``` // Put your logs below this line ```

aws-exports.js

No response

Manual configuration

No response

Additional configuration

No response

Mobile Device

iPhone 7

Mobile Operating System

14.5

Mobile Browser

Not relevant

Mobile Browser Version

No response

Additional information and screenshots

No response

sacrampton commented 3 years ago

Hi @iartemiev - I want to outline precisely what we did and what we are seeing in case there is anything we overlooked.

These were the installation instructions that we followed.

yarn add aws-amplify@rn-sqlite amazon-cognito-identity-js@rn-sqlite aws-amplify-react-native@rn-sqlite react-native-sqlite-storage @react-native-async-storage/async-storage
# to ensure we're not left with any of the previous versions
rm -rf node_modules yarn.lock package-lock.json
yarn
npx pod-install

After the above installation the libraries installed are as shown here.

libraries

We then delete the old app and reinstall the app again.

We still see the same 22 seconds when querying on 30K asset records and 35 seconds for 42K asset records.

We are importing DataStore like this

import {DataStore} from 'aws-amplify';

Appreciate any thoughts you have on why we might not be seeing the same as you are seeing.

sacrampton commented 3 years ago

Hi @iartemiev - some more updates here.

We deleted node modules and pod files, and follow all the steps above, but when we check the code it is still getting data from AsyncStorage.

const model = 'Asset';

 const key = (await AsyncStorage.getAllKeys()).find((key) =>
 key.includes(`@AmplifyDatastore::user_${model}`),
 );

 console.log('Item from aync storage >>', await AsyncStorage.getItem(key));

So clearly we are not doing something right here. Can you see if there is anything obvious that we are omitting.

Suspect something is being cached in existing project so will create a new project from scratch, but that will take several hours, an see if that works for us.

sacrampton commented 3 years ago

Hi @iartemiev - getting errors with our schema. We have a modified schema.js that works fine with AsyncStorage when running Datastore.start()

error: no such table: Setting Possible unhandle promise rejection "code": 5, "message": "duplicate column name: assetOperatingUnitId",

So went back to the standard codegen model and got this error.

Error >> no such table: Setting backend.js:12506 Possible Unhandled Promise Rejection (id: 0): Object { "code": 5, "message": "near \"group\": syntax error", }

iartemiev commented 3 years ago

Ok, as of the last comment, it does look like you're now utilizing the SQLite adapter but it's having trouble generating the local SQL schema from your AppSync schema.

Could you please share your entire schema.graphql file with me? If you don't feel comfortable posting it here, you could add it to a private repo and then invite me as a collaborator.

sacrampton commented 3 years ago

Hi @iartemiev - I can't share it here - but @manueliglesias has a copy of it he could provide to you.

iartemiev commented 3 years ago

@sacrampton I was able to test with your schema and identified the issue. We were not escaping column and table names when generating the SQLite schema, so reserved words in model fields names (such as group) were causing errors. Will be fixed in this PR: https://github.com/aws-amplify/amplify-js/pull/8519. We will publish a new version of aws-amplify@rn-sqlite once the PR is reviewed and merged

iartemiev commented 3 years ago

@sacrampton we just published an update: aws-amplify@4.1.3-rn-sqlite.3. It fixes the error you're seeing (I validated it against your schema).

sacrampton commented 3 years ago

That's awesome @iartemiev - will start testing immediately.

sacrampton commented 3 years ago

Hi @iartemiev - we were able to sync our model, but running into an issue with Boolean fields when querying.

We query the model and get this error - the schema for hasPhotos is Boolean. I did a scan of the DynamoDB database and there are no "rouge" non-boolean values for this attribute. Not sure what is going on with this.

     Error: Field hasPhotos should be of type boolean, number received. 0
sacrampton commented 3 years ago

Hi @iartemiev - just a question on _deleted fields. Noticed by default this is not added to the schema so we can't query to not return deleted records - or does DataStore automatically exclude those from queries?

iartemiev commented 3 years ago

We just fixed the Boolean error in aws-amplify@4.1.3-rn-sqlite.4. Please upgrade to that and let us know if you run into any other issues.

just a question on _deleted fields. Noticed by default this is not added to the schema so we can't query to not return deleted records - or does DataStore automatically exclude those from queries?

Records with _deleted: true get deleted from the local store and should not be returned in the DataStore.query results.

sacrampton commented 3 years ago

Hi @iartemiev - After upgrading we are again getting this error which was previously seen.... Appreciate any ideas.

Error during getting data from async storage. {​​​​​​​message: "no such table: Setting", code: 5}​​​​​​​
backend.js:12506 Possible Unhandled Promise Rejection (id: 0):
Object {​​​​​​​
  "code": 5,
  "message": "duplicate column name: assetOperatingUnitId",
}​​​​​​​
iartemiev commented 3 years ago

The only thing we've changed in the latest version is handling the conversion of a SQLite boolean (0, 1) to a JS representation (false, true) everything else is exactly the same.

Have you manually modified your schema.js since? A likely culprit here is a manually added foreign key, like the one in the error message above.

When we generate the SQL schema, we have to account for every field that's going to be stored in the table beforehand and add it as a column. So we make inferences such as adding join table FKs automatically when we don't expect them to be added to the generated schema via codegen. This is not required in the NoSQL dbs, because they don't have fixed schemas. So what's likely happening is when we're creating the SQL tables, your manually added keys are duplicates of what we auto-generate, hence the message.

I can add some field name de-duping logic for this (i.e., only auto-generate the FK if it's not explicitly added to the schema.js fields for that model). But in the meantime, if you take those fields out of schema.js but leave them in the type definitions (index.d.ts) that should unblock you.

iartemiev commented 3 years ago

Actually, scratch what I said re: taking those fields out. I'm not sure that will help if you need to access them in your app. Let me see if I can get an update out to you.

sacrampton commented 3 years ago

Hi @iartemiev - think it is something to do with applying update to existing project. Earlier we said we had to recreate from scratch

iartemiev commented 3 years ago

I don’t think that’s the case here. The last update made no changes to the storage adapter itself, so it should be fine to apply it to an existing app. The cause is most likely what I described above.

iartemiev commented 3 years ago

I opened a PR that addresses that by the way, but given that it’s a US holiday weekend, it might not get reviewed until Tuesday

iartemiev commented 3 years ago

@sacrampton we just released a new version - 4.1.3-rn-sqlite.5. For this update, I do recommend deleting and re-installing the app, since there are changes to the schema generation code.

sacrampton commented 3 years ago

Thanks @iartemiev - will let you know how we go

sacrampton commented 3 years ago

Hi @iartemiev - good news - SQLite now working and previously with our full schema it wouldn't even load 31K assets - and with the massively trimmed down schema it was 64 seconds to query 31K assets. Now we can use the full schema and we are seeing 2 seconds to query 31K assets. Still testing, but looking really promising.

iartemiev commented 3 years ago

That's great to hear!

iartemiev commented 3 years ago

@sacrampton by the way, are you seeing 2 second query time with the RN Debugger enabled or disabled? I was getting records back in less than half a second with the debugger disabled even with hundreds of thousands of records in the model. But this may be due to hardware differences as well.

sacrampton commented 3 years ago

Hi @iartemiev - I can confirm the 2 second query is with RN Debugger disabled.

On Android we are running into issues with doing hydrating the cache where it is just getting stuck. See the log screen shot below.

MicrosoftTeams-image (4)

Any ideas?

iartemiev commented 3 years ago

The warning you're seeing shows up when using the latest stable version of Amplify with the AsyncStorage adapter as well, it should not be a problem in and of itself. It's due to DataStore using a long setTimeout value (300,000 ms) for ensuring we're still getting subscription keep-alive messages from AppSync. So unfortunately the screenshot doesn't give me any info on what could be going wrong.

Follow up questions:

sacrampton commented 3 years ago

Hi @iartemiev - the sync stops then it does eventually start again, but it could take quite a while to restart. With the AsyncStorage adapter it had a limit of 6MB so our datasets were much smaller than those we are testing now. Getting more specific responses to your questions asap.

When we have synced all 31K assets in our test set the query is taking 5 seconds. This query previously on AsyncStorage on iOS was taking 64 seconds and now on SQLite on iOS taking about 2 seconds.

sacrampton commented 3 years ago

Hi @iartemiev - here are our more detailed answers.

_Are there any other debug messages, warnings, or errors in the console being emitted by Amplify or DataStore (particularly, after setting Amplify.Logger.LOGLEVEL = 'DEBUG'?) If so, please either post them here or email them to me.

Was your app working on Android with the AsyncStorage adapter?

Does the sync always get stuck in the same place in the process (i.e., after a particular model is synced)?

What happens when you try to query a model when the sync is stuck?

Also, we have noticed that Delta sync is taking a long time with rn-sqlite.5 version, earlier it was done in few seconds. This is happening in both iOS and Android.

sacrampton commented 3 years ago

Hi @iartemiev - we have built a signed APK to test the Android version on device and now are seeing performance comparable to what we were seeing on iOS.

iartemiev commented 3 years ago

That's awesome! Do you still have a lot of testing to do for your app on both iOS and Android before you're comfortable with the SQLite adapter? Or are you getting pretty close?

sacrampton commented 3 years ago

Hi @iartemiev - the performance of the AsyncStorage adapter basically makes it unusable on anything but exceptionally small data sets. So we are already an order of magnitude better off using the SQLite adapter. We will be going forward with the SQLite adapter and keep testing over the next week or so - but looks pretty good with where we are at now. I am very comfortable to go forward with SQLite adapter.

By the way, getting TOTAL's is really important to us. For example, to work out how many assets are in a work package - we want to be able to just query the "COUNT" of those records that meet that query. Seems to be a standard function of SQLite, but not available in DataStore's API. Is there a simple way you can enable the total number of items to be returned (not the items - just the total) given this seems to be a standard function of SQLite?

iartemiev commented 3 years ago

Very glad to hear that!

Re: retrieving total row count. I definitely see the value in having that functionality. But since it would require a public API change to DataStore (e.g., we'd need to add a new public method or extend an existing one), we likely wouldn't be able to get that implemented very quickly since we have a particular process we must follow when introducing such changes.

There is a workaround that you could leverage immediately though. You can open the database in your app in read-only mode and then execute a query against the table.

It would look something like this:

import SQLite from 'react-native-sqlite-storage';
SQLite.enablePromise(true);

const DB_NAME = 'AmplifyDatastore';

let db;
function openDB() {
  db = await SQLite.openDatabase({name: DB_NAME, readOnly: true})
}

function closeDB() {
  if (db) {
    db.close();
  }
}

function getModelRecordCount(modelName) {
  if (!db) {
    await openDB();
  }

  const statement = `SELECT COUNT(*) AS count FROM ?`;

  const [resultSet] = await db.executeSql(statement, [modelName]);

  const result =
    resultSet &&
    resultSet.rows &&
    resultSet.rows.length &&
    resultSet.rows.raw &&
    resultSet.rows.raw();

  if (result) {
    return result[0].count;
  }
}

Then when you need to retrieve the total, you can do:

const assetCount = await getModelRecordCount('Asset');
iartemiev commented 3 years ago

Btw, the above code is just a rough example. You may need to modify it a little bit to suit your needs, but it should be close.

iartemiev commented 3 years ago

I do want to emphasize the importance of opening the DB in read-only mode, otherwise, you may experience some unpredictable behavior with DataStore.

sacrampton commented 3 years ago

Hi @iartemiev - thanks so much for the suggestion - will follow this and let you know how we go.

jeremy-white commented 3 years ago

@iartemiev I wanted to share with you an error I've come across when trying this out on my project

Object {
    "code": 5,
    "message": "table BrokenExample has no column named owner",
}

I use @auth and see an instance of that error for every model I use it on except for one. The one model that seems to work is the only one that I define the ownerField as "owner" and also have "owner" defined explicitly in the model definition, so something like

type WorkingExample @model
    @auth(rules: [
        { allow: owner, ownerField: "owner", operations: [create] }
    ]) {
    id: ID!
    owner: String
}

versus

type BrokenExample @model
    @auth(rules: [
        { allow: owner, operations: [create] }
    ]) {
    id: ID!
}

Seems that explicitly defining the owner field like in the working example would be a fine workaround . . . but just wanted to make sure it's a situation you're aware of.

sacrampton commented 3 years ago

Hi @iartemiev - using the SQLite Adapter on RN we are noting something with caching and querying.

If I create a record the query will not return that record until it has been synced with the server. Its created locally, but query doesn't get anything till it has been synced. What are we doing wrong here?

Also, if we update rather than create the record we can query the response immediately and don't need it to sync first. We need the same with create.

await DataStore.save(new Asset({...assetData}));
const data = await DataStore.query(
 Asset,
 (s) => {
 if (!downloadEntirePlant) {
 return s
 .assetParentAssetId('eq', mobileActiveLocation?.id)
 .assetPackageId('eq', userPackageId)
 .assetPlantId('eq', userPlantId)
 .isDeleted('ne', true);
 } else {
 s.assetParentAssetId('eq', mobileActiveLocation?.id)
 .assetPlantId('eq', userPlantId)
 .isDeleted('ne', true);
 }
 },
 {
 sort: (s) => s.name(SortDirection.ASCENDING),
 },
 );
iartemiev commented 3 years ago

@sacrampton my guess is that it's got something to do with the predicate you're crafting.

You can test this with:

const newAsset = await DataStore.save(new Asset({...assetData}));

const data = await DataStore.query(Asset, newAsset.id)

If this returns the expected data pre-sync, then I would look into adjusting the fields and/or values in that predicate. If not, there may be a bug in DataStore, so just let me know if that's the case.

iartemiev commented 3 years ago

@jeremy-white, thank you for calling this out. We'll need to make sure to add owner to the SQL schema in cases where it's implicitly defined (i.e., omitted from the model fields).

sacrampton commented 3 years ago

Thanks @iartemiev - thanks for the feedback - can confirm your suggestion does indeed return data before it is synced to the server. Just need to work out how we get a list back now.

jeremy-white commented 3 years ago

Just a couple more things I've hit

Object {
  "code": 6,
  "message": "NOT NULL constraint failed: TestModel.items",
}

Hit this for HAS_MANY connections defined as

    items: [Item!] @connection(keyName: "byTestModel", fields: ["id"])

as well as

    items: [String!]

I've been able to get past this by changing isRequired to false in schema.js . . . this could also be my misunderstanding but I was under the impression that [String!] meant that the items in the array had to be defined but the attribute itself could be undefined, so I was only using ! to be extra cautious. I can certainly work around this one as well but just wanted to raise it as it works as I'd expect with AsyncStorage but fails against SQLite

I'm also seeing

Error: Field items should be of type string, object received.

For fields I define as AWSJSON and have values that are arbitrary JSON . . . I haven't been able to workaround this other than just removing those fields from schema.js so if you have a better workaround I would love to hear it.

I really appreciate your work on this @iartemiev as I think it will be a HUGE improvement!

sacrampton commented 3 years ago

Hi @iartemiev - does anything in our predicate above jump out at you as a possible issue that is not getting un-synced data? We know we can retrieve a single un-synced record whose id we know. But the search for a list of records that included synced and un-synced records is something we are still struggling with. Any hints you might have on what we might be doing wrong in the above predicate would be much appreciated.

We are going live with our App this weekend - we wouldn't have got here without your amazing support in this effort. Much appreciated.

iartemiev commented 3 years ago

@sacrampton, I don't think there's anything inherently wrong with it. It's hard to know for sure without being familiar with your app code and knowing what values are being passed and what data is expected to be present in the local store. DataStore doesn't use a cache. Every time you execute DataStore.query, it performs a SQL query against your local database.

If I were debugging this, I would check that each of the operands has the expected value by logging them out to the console.

You can also try removing fields from your predicates one by one to determine which one is causing all the results to be filtered out.

Try something like the following in the same place as where you're normally attempting to query the local records (and before the sync):

const newAsset = await DataStore.save(new Asset({...assetData}));
const data = await DataStore.query(Asset, newAsset.id);

console.log(data.assetParentAssetId === mobileActiveLocation?.id, data.assetParentAssetId, mobileActiveLocation?.id);
console.log(data.assetPackageId === userPackageId, data.assetPackageId, userPackageId);
console.log(data.assetPlantId === userPlantId, data.assetPlantId, userPlantId);
console.log(data.isDeleted !== true, data.isDeleted);

If you have a false in any of those 4 console logs, that should give you the problematic field.

Lastly, if you have debug logs on, you should see the SQL SELECT statement that is generated by that predicate. Does it look correct?

sacrampton commented 3 years ago

Hi @iartemiev - the count suggestion you had above works for smaller data sets in iOS (extremely slow with larger data sets - > 1 minute). But in Android we get the following error.

{​​​​​​​​"message":"no such column: false (code 1 SQLITE_ERROR): , while compiling: SELECT * FROM Asset WHERE auditCompleted = false","code":0}​​​​​​​​​​​​​​​
iartemiev commented 3 years ago

@sacrampton SQLite represents boolean values as 0 and 1. So if you're specifying a literal false in your WHERE clause it attempts to look up a column with the name "false" as the error message suggests.

You can either specify WHERE auditCompleted = 0 or use a parameterized query. The latter will coerce a JS boolean to the SQLite representation.

re: performance, I'm surprised it's running that slowly. I'll try to reproduce on my end to confirm.

sacrampton commented 3 years ago

Hi @iartemiev - replacing "false" with 0 is not working for us... The error seems to be that the column we are calling is not there. But we know it is there as it works in other queries from DataStore - just not count in SQLite.

iartemiev commented 3 years ago

What's the error you get when you query WHERE auditCompleted = 0?

By the way, are you calling SELECT COUNT(*) as count FROM Asset? or SELECT * FROM Asset? If you're doing the latter as your previous comment suggests, it will return all records from the DB, which is going to be slow. If you're just trying to get a count, there's no reason to do that.

sacrampton commented 3 years ago

Hi @iartemiev - here are a few points....

On iOS if I do the following query it gives me the correct answer and does so very quickly.

SELECT COUNT(*) as count FROM Asset WHERE auditCompleted = true

If we execute exactly the same query on Android it gives this error

Object {
"code": 0,
"message": "Cannot bind argument at index 1 because the index is out of range. The statement has 0 parameters.",
}

We were using the other SELECT statement because we can't seem to get the COUNT to work with an AND statement - the following query in iOS returns an incorrect value of 1. In Android it returns the same error.

SELECT COUNT(*) as count FROM Asset WHERE auditCompleted = true AND packageName = JONESBORO

iartemiev commented 3 years ago

Try parameterizing the query and enclose the WHERE clause expression in parentheses.

E.g.,

await db.executeSql('SELECT COUNT(*) as count FROM Asset WHERE (auditCompleted = ? AND packageName = ?)', [true, 'JONESBORO']);
sacrampton commented 3 years ago

Hi @iartemiev - thanks for that - did the trick now. So resolved all issues around the count. Thanks so much as always.

Still noticing the delta-sync on SQLite is slower than before. Not sure if you were seeing that.

iartemiev commented 3 years ago

My pleasure! Very glad to hear that things are finally coming together!

Still noticing the delta-sync on SQLite is slower than before. Not sure if you were seeing that.

Hmm, I remember you mentioning that in your email, but I haven't observed it yet myself. I'll do some more benchmarking around this specifically when I get a chance.

mjaydeep01 commented 3 years ago

Hi @iartemiev - some more updates here.

We deleted node modules and pod files, and follow all the steps above, but when we check the code it is still getting data from AsyncStorage.

const model = 'Asset';

 const key = (await AsyncStorage.getAllKeys()).find((key) =>
 key.includes(`@AmplifyDatastore::user_${model}`),
 );

 console.log('Item from aync storage >>', await AsyncStorage.getItem(key));

So clearly we are not doing something right here. Can you see if there is anything obvious that we are omitting.

Suspect something is being cached in existing project so will create a new project from scratch, but that will take several hours, an see if that works for us.

@iartemiev, @sacrampton I am facing same issue on Android, no matter what i do, data is always being stored in the AsyncStorage. Steps, I followed are as below..

  1. Delete node_modules folder, any .lock file.
  2. Remove app from the Android device.
  3. npm install
  4. npm install aws-amplify@rn-sqlite amazon-cognito-identity-js@rn-sqlite aws-amplify-react-native@rn-sqlite react-native-sqlite-storage
  5. Install app on the device.
  6. On querying the data from datastore, data downloaded from the appsync, but stored in the AsyncStorage.

Am I missing any specific setting?