Closed sacrampton closed 3 years ago
I closed my original issue as I could no longer replicate the problem anymore
Do u happen to be using the Storage module in your app?
Hi @nubpro - we are using the Storage module in the app for upload of photographs.
New Video - here is exactly the same application - with a few hundred records loaded versus over 7,000 in the previous video.
Seems loading more than a trivial amount of data is grinding it to a halt. What you see below is my expectation for what we should be getting with a larger data set - even 100K records.
https://user-images.githubusercontent.com/6362888/120977328-8e482480-c7b6-11eb-9c10-61daf9a3076a.mp4
Hi @manueliglesias / @iartemiev
Is there anything I can provide you to move this along? Its on a critical time line for us to get this resolved.
Thanks
Stephen
@sacrampton is the performance comparable between Simulator and physical devices on both iOS and Android? Or is one worse than the other? Does query performance degrade gradually as the number of records increases? Or does it tend to be acceptable up to a certain point and then rapidly drops?
Since you said only 5 of your models have data in them, I'll attempt to reproduce this with 5 models and 10k+ records in each. Will let you know what I find.
Also, which version of aws-amplify are you currently using?
Hi @iartemiev
I went through the list of models and data counts in more detail. The difference between the data in each model between the first and second videos is shown here.
Thanks...
Data By Model
Name | High Count | Low Count |
---|---|---|
Asset | 7,409 | 359 |
AssetVisit | 1,127 | 100 |
Photo | 4,667 | 352 |
ClientPickList | 1,233 | 1,233 |
FailureMode | 25 | 25 |
FieldList | 2,540 | 2,540 |
FieldSet | 147 | 147 |
GlobalPickList | 2,970 | 2,970 |
OperatingUnit | 1 | 1 |
Package | 34 | 34 |
PhotoButton | 13 | 13 |
Plant | 1 | 1 |
Risk | 37 | 37 |
Scale | 53 | 53 |
Uom | 1,099 | 1,099 |
User | 1 | 1 |
21,357 | 8,965 |
Using aws-amplify version = 3.3.27
Hi @iartemiev - I had another thought on this. On the Asset object I have potentially 400+ attributes, and potentially similar on the AssetVisit object. In DynamoDB there would rarely be more than 30 attributes on any individual record so the size is quite small. And in GraphQL we just download the attributes we need for a specific app so we are never trying to download 400+ attributes.
But I'm starting to think that DataStore might is creating 400+ attributes on every single record on the mobile. In my case they would be hundreds of empty attributes taking up space. Is that what might be going on? Can you test by creating a couple of models with hundreds of empty attributes and see if that is the issue?
Thanks for the additional info. Are you returning all of the records for those models when you're querying Asset, AssetVisit, etc.? Or are you using predicates to return a subset?
Hi @iartemiev - sorry I'm probably not understanding the question. Our app is multi-tenanted. So we have a base query to return only the data that belongs to that client. But in our existing GraphQL app we would only return the specific attributes from say the Asset that the App needs (say 30 attributes and not 400+ attributes).
Also, I notice in the GraphQL when it is a BELONGS_TO association that the GraphQL query is returning all the associated data and I guess it is storing that against the record. For example, and Asset in most cases has a ParentAsset - so it seems that not only is the Asset getting all 400+ blank attributes, it is storing the 400+ blank attributes of the ParentAsset. The Asset has a lot of HAS_MANY connections, so each of those objects is also getting the 400+ blank attributes of the connected Asset in its GraphQL query.
I meant when you're calling DataStore.query
, are you passing it a predicate, e.g., DataStore.query(Asset, c => c.someField('eq', 'someValue')
or perhaps using pagination or are you retrieving all of the records from the local store by calling DataStore.query(Asset)
(no predicates or pagination)?
In response to your question about fields with empty values: all of the model fields in your schema will be part of the selection sets for the GraphQL operations that populate the local store. Any empty values will come back as null
from AppSync and will be persisted as null
in the local store.
For example, if the records have empty values in fields field1
-field5
.
type Post @model {
id: ID!
title: String!
field1: String
field2: String
field3: String
field4: String
field5: String
}
They would be stored as:
{
"id": "id1",
"title": "someTitle",
"field1": null,
"field2": null,
"field3": null,
"field4": null,
"field5": null
}
Do you know which fields your app will need ahead of time? Or is that determined programmatically? In the former case, you could modify your schema.js
and remove the fields that you don't want to be synced down.
Thanks @iartemiev. I have a further question related objects - whether the data from the related object is embedded in the original object inside DataStore. The queries tend to suggest that might be the case.
So for example, I might have an optional connection to a parent Post
type Post @model {
id: ID!
parentPost: Post @connection (fields: "postParentPostId")
postParentPostId: ID
title: String!
field1: String
field2: String
field3: String
field4: String
field5: String
}
I am wondering if the Parent record is embedded in the current record like something like this.
{
"id": "id1",
"title": "someTitle",
"postParentPostId" : "id2",
"field1": null,
"field2": null,
"field3": null,
"field4": null,
"field5": null
"parentPost" {
"id": "id2",
"title": "someTitle",
"postParentPostId" : "id3",
"field1": null,
"field2": null,
"field3": null,
"field4": null,
"field5": null
}
}
Also, editing the schema.js is fine - I can do that. Do I need to edit any of the other files (ie. index.d.ts, etc.)
And I guess I can have a different schema.js for each application I have so each application can get to just the attributes it needs.
Hi @iartemiev - to answer your question "...I meant when you're calling DataStore.query, are you passing it a predicate, e.g., DataStore.query(Asset, c => c.someField('eq', 'someValue') or perhaps using pagination or are you retrieving all of the records from the local store by calling DataStore.query(Asset) (no predicates or pagination)?..."
We are doing the base query with a predicate as you describe and we are using pagination (500 records per cycle) because 1000 causes a resource exceeded error.
Hi @iartemiev - if we expand your type Post model above and have 100 x "field01": null fields and 2 fields as you show. In DynamoDB you would just have the id/title fields and this would be 32b in size. But if I look at the size of adding 100 null fields it swells to 1,532b. On a mobile where we are trying to save memory then it seems counterintuitive to be storing null fields. Is there any opportunity to store like DynamoDB? I guess not since I think you are using SQLite..
Hi @iartemiev - Some more results of testing with the original data set of 21,357 records detailed above. Querying the asset list before was taking 12 seconds. I have created a new schema.js that reduced the number of attributes on the records down significantly for the 3 models below.
Original | New | |
---|---|---|
Asset | 371 | 121 |
AssetVisit | 352 | 43 |
Photo | 112 | 60 |
The result was that by reducing the number of attributes on the records as shown above halves the response time (6 seconds to retrieve the assets list versus 12 seconds with the the previous one. To confirm, 100% of the attributes I removed were null. The null attributes are used by other (backend) parts of the solution - but not required in mobile app.
I think this shows that syncing null attributes to a record can dramatically impact performance. Given that there is a direct connection between storing null attributes and performance it follows that if we can get rid of all null attributes (like DynamoDB does) then we are going to be maximizing performance. The only way I can think of to get rid of all null values at all times is if you were able to use some sort of a NoSQL repository on the device that only stores non-null attributes, thus optimizing space and performance?
The beauty of GraphQL and DynamoDB is that I can create a comprehensive schema, but only store minimal data - so I have huge flexibility with economy and performance. But if we are storing a null value for every single possible attribute that takes away the flexibility.
Hi @iartemiev - another test result to share with you. Used another larger dataset for testing. This data set had these quantities in these 3 models. All the other models are approximately the same quantities - they don't vary by project as they are essentially reference data objects.
Records | |
---|---|
Asset | 30,998 |
AssetVisit | 9,315 |
Photo | 25,776 |
I tried with the Original schema.js and the New schema.js defined above.
Note that when we query a model the response is irrelevant to the amount of data we are retrieving. From the 30K assets I could do a query that retrieves 1 record and it will take 1 minute and 4 seconds. I could conduct another query that returns 113 records and it take 1 minute and 4 seconds.
Hi @iartemiev - one final update from my testing. On the desktop you ship a storage adapter for IndexedDB which is a NoSQL database. So we did a test on the desktop with the large 30K+ Asset data set described above using the full schema.
It took about 11 minutes for the full dataset to download - but after that a query of the Asset table took 2 seconds. This compares with 64 seconds on Mobile App (React Native).
I'll try to respond to the questions/points in order
I have a further question related objects - whether the data from the related object is embedded in the original object inside DataStore.
For Has One and Belongs To relationships we resolve the connection at run time when you call DataStore.query
on that model, but they are not saved that way to the local store.
We are doing the base query with a predicate as you describe and we are using pagination (500 records per cycle) because 1000 causes a resource exceeded error.
By "base query" are you referring to the initial base sync query between AppSync and the local store? Or calling DataStore.query
to retrieve records from the local store?
On a mobile where we are trying to save memory then it seems counterintuitive to be storing null fields. Is there any opportunity to store like DynamoDB? I guess not since I think you are using SQLite..
That's not possible at the moment, but it's certainly something we will explore in the future. On React Native, DataStore utilizes Async Storage for the local store implementation. Async Storage in turn uses different underlying persistence layers depending on the platform. On iOS it uses a serialized dictionary/key-value store, not SQLite.
Note that when we query a model the response is irrelevant to the amount of data we are retrieving. From the 30K assets I could do a query that retrieves 1 record and it will take 1 minute and 4 seconds. I could conduct another query that returns 113 records and it take 1 minute and 4 seconds.
Could you please share the query you're using here? Specifically, what predicate are you passing?
Hi @iartemiev - thanks for answering my long chain of rambling "input" / observations.
type Asset @model
@key(name: "byAssetPackageId", fields: ["assetPackageId"])
@key(name: "byAssetPlantId", fields: ["assetPlantId"])
{
id: ID!
name: String!
plant: Plant @connection (fields: ["assetPlantId"])
assetPlantId: ID!
package: Package @connection (fields: ["assetPackageId"])
assetPackageId: ID
auditCompleted: Boolean
}
type Plant @model
{
id: ID!
name: String!
}
type Package @model
{
id: ID!
name: String!
}
The base query would be to download all assets in a Plant
query baseQuery{
syncAssets(filter:{
assetPlantId:{eq:"xxxxxxxxxxxx"}
}){items{id _version}}
}
Then in DataStore with the 30K assets we do a query to find Assets in a package that have not yet been audited. This query takes 1 minute 4 seconds to return the 2 records. If we remove the limit it returns 113 records and takes the same time.
const data = await DataStore.query(
Asset,
(asset) =>
asset
.assetPackageId('eq', userData.package.id)
.auditCompleted('eq', false),
{
page: 0,
limit: 2,
},
);
Thank you, that is helpful. We'll be doing additional library benchmarking today in order to pinpoint the issue and will provide an update with our findings.
Quick update: I have been able to repro performance issues similar to those described above when using a model with 10 fields containing values + 100 fields with null
values and > 30k records in the local store. Will be investigating causes and potential solutions.
Hi @iartemiev - that is fantastic news that you were able to reproduce. Hopefully you can also reproduce it working perfectly with IndexedDB/Browser too. Await your feedback.
We've pinpointed the bottleneck to the multiGet
method in Async Storage, it seems that we're brushing up against that library's performance limitations here.
We'll be brainstorming what improvements we could make in DataStore in order to mitigate this as much as possible. For your use case specifically, even with the reduced number of fields, are a considerable number of values in your records still null
? If so, we have a couple of ideas on how we may be able to optimize for this.
As an aside, when you're seeing this poor performance do you have the React Native debugger enabled on your app? We're seeing significantly better performance (still suboptimal, but 2-3x faster) with the debugger disabled, so just want to point that out as well.
@iartemiev - looks like good progress.
Ultimately, I'd like to retain access to all fields (like we do with IndexedDB) - but for now we can run with the reduced schema. And even in the reduced schema I would estimate that 80% of those fields are null in the dataset.
I don't believe we have debugger enabled - but I will check.
@iartemiev - checked performance with and without debugger enabled. The 64 second response was with React Native Debugger enabled. With Debugger switched off the response is 22 seconds. As you said - its about a 3X improvement.
Worth remembering that 22 seconds was with the stripped down schema and is still 11X slower than the Browser/IndexedDB running with the full schema. Good to know we'll see that sort of improvement when we go to production, but 22 seconds with cut-down schema still isn't a solution.
Thanks for working this.
To level-set on expectations a bit, I don't think we'll be able to get the same performance out of AsyncStorage as we do from IndexedDB, no matter what optimizations we add. The former is an intrinsically slower persistence layer when dealing with large amounts of data. I'm hoping we can see another 2-3x improvement, but 10x is quite unlikely. In order to achieve an order of magnitude improvement, we'll probably need to leverage a faster local database for React Native. We'll certainly explore that option, but it's not something we'd be able to release in a matter of a day or two.
Hi @iartemiev - just checking back in to see whether you have made any progress on this.
Hey @sacrampton, we've identified the optimization we're planning on implementing and it's what you suggested above, i.e., not persisting null
values to the local databases. We're testing this solution now to make sure that it doesn't have any unwanted side effects.
If you would like to test the effect this change will have on your queries before we release the fix, you can make the following temporary change in your Amplify dependencies manually:
Note you'll want to delete your app and re-install it after you make this change, i.e., re-sync the local database in order for the change to take effect
Open ./node_modules/@aws-amplify/datastore/lib-esm/datastore/datastore.js
Change this code (should start at line 279)
var initializeInstance = function (init, modelDefinition, draft) {
var modelValidator = validateModelFields(modelDefinition);
Object.entries(init).forEach(function (_a) {
var _b = __read(_a, 2), k = _b[0], v = _b[1];
modelValidator(k, v);
draft[k] = v;
});
};
To:
var initializeInstance = function (init, modelDefinition, draft) {
var modelValidator = validateModelFields(modelDefinition);
Object.entries(init).forEach(function (_a) {
var _b = __read(_a, 2), k = _b[0], v = _b[1];
modelValidator(k, v);
if (v !== null) {
draft[k] = v;
}
});
};
If your app happens to be using CJS modules, instead of ESM, you'll want to change the same code in ./node_modules/@aws-amplify/datastore/lib/datastore/datastore.js
instead (lib
subdirectory as opposed to lib-esm
)
Hi @iartemiev - that sounds great - will get testing immediately and let you know how we go.
I just created another issue (#8450) also dealing with null values and subscriptions. Will the changes you made here deal with that?
Hi @iartemiev - we did the testing and can indeed see that the null values are removed from the data. However, it didn't change the performance we were seeing. Namely 64 seconds in debug and 24 seconds without debug.
Hmm that's odd, as we're seeing a roughly 2-2.5x improvement with the above change. Did you reinstall the app and re-sync the records after changing the code?
If so, could you please verify that the fields containing null
values are definitely no longer getting persisted to AsyncStorage?
You can do that by querying AsyncStorage directly (without DataStore):
import AsyncStorage from '@react-native-async-storage/async-storage';
async function getFromAsyncStorage() {
const model = 'Asset';
const key = (await AsyncStorage.getAllKeys()).find(key =>
key.includes(`@AmplifyDatastore::user_${model}`)
);
console.log(await AsyncStorage.getItem(key));
}
Hi @iartemiev - will check this out and confirm back with you asap.
If you have a chance, can you comment on this somewhat related issue of nulls in DataStore (#8450)
Hi @iartemiev - have double checked everything you said and the result is still the same.
Here is a screen shot of the data we see on the record after applying your fix - all the null values are removed.
When we try the code you list above we are getting this - JSON value '
[AsyncStorage] Using undefined type for key is not supported. This can lead to unexpected behavior/errors. Use string instead. Key passed: undefined
If the screenshot above is the result of a DataStore.query
, that doesn't tell us definitively whether the null
s are being persisted or not to AsyncStorage, just that they're being omitted when the model class is instantiated by DataStore.
If the code I provided didn't work, you can log out all the keys with console.log(await AsyncStorage.getAllKeys())
, then pass one of those keys to await AsyncStorage.getItem("some-key")
and log that to the console.
It may be easier to set AsyncStorage to a global and then interact with it in real time via the RN debugger console.
In your app:
import AsyncStorage from '@react-native-async-storage/async-storage';
global.AsyncStorage = AsyncStorage;
Then in the debug console you can run e.g.,
await AsyncStorage.getAllKeys();
// pick one of the keys that starts with "@AmplifyDatastore::user_{your model}"
// call getItem on that key
await AsyncStorage.getItem("@AmplifyDatastore::user_...")
Hi @iartemiev - the team has been working to get this done, but not getting much progress.
"I am only getting keys in which i store the data using - awaitAsyncStorage.getAllKeys()"
"I am not getting any key like - @AmplifyDatastore::user_Asset"
That was checked in the React Native debugger console.
Ah, I just realized that you're on an older version of aws-amplify that still uses the AsyncStorage module that's bundled with React Native, as opposed to the external module. You should be able to query it by importing from RN:
import { AsyncStorage } from 'react-native'
You should then see the DataStore-specific values in there using the code I provided above
(Apologies, finger slipped and I accidentally clicked "Close with comment" instead of "Comment". Re-opened the issue)
Hi @iartemiev - the team has checked using await AsyncStorage.getItem(key).
The code used before looks like this
This results that come are shown below
After applying null validation we are getting these field in asset record. Here is the code after.
And this is the data we are seeing
So it is clear that the null fields are gone after updating the code in aws-amplify module - but the speed doesn't change.
Hi @iartemiev - to rule out any related issues we upgraded aws-amplify to 4.1.1 - still same performance issue
Hi @iartemiev - just back tracking for a second. I significantly reduced number of attributes on 3 tables and assumed it was removal of all those null values which made the difference. Maybe it was the fact of reducing attributes on a table?
Reducing the amount of data stored in AsyncStorage will increase query performance. Whether that’s done through explicitly removing fields or by not persisting null values, the effect should be roughly the same.
Hey @sacrampton, wanted to share an update with you: we've decided to implement a new storage adapter option for React Native CLI apps that utilizes SQLite as the local persistence layer. We saw massive performance improvements when querying with predicates, sorting, and/or pagination, as well as processing the base/delta sync queries compared to the performance of AsyncStorage.
We still have some testing to do before we're ready to release this, but it's about 90% implementation complete as of right now. Once we're satisfied with the stability of this adapter, we'll let you know how you can utilize it in your app.
As an aside, you mentioned above that you're on version 3.3.27 of Amplify. Is there something in particular that's keeping you from upgrading to the latest version?
That is awesome @iartemiev - I had the guys upgrade to 4.1.1 - there was nothing keeping us from upgrading. Look forward to hearing more.
Hey @sacrampton, we're finishing up our testing of the SQLite adapter.
Everything is looking good so far, the only feature we haven't had a chance to implement is cascading delete functionality on relations. Is this a vital feature for your use case? Or would it be acceptable to omit it for now but add it in the near future? If we omit it, you would need to manually delete related entities via a separate call to DataStore.delete
.
Hi @iartemiev - sounds great. Cascading deletes don't matter to me at this time.
Update: ran into a hiccup when testing many-to-many relations. Got a solution in mind, but will need just a bit more time to implement it and test. I will provide another update tomorrow
Hi @iartemiev - appreciate you keeping me updated. Thanks.
Hello again, @sacrampton! We’ve released a special version of AmplifyJS (aws-amplify@rn-sqlite) containing the new SQLite adapter. We’re still considering this to be an experimental feature, hence the special tag/version.
We’ve tested this adapter thoroughly on our end with up to 350k records. Querying on a predicate that returns a subset of data or using pagination and/or sorting is blazing fast (a fraction of a second in our testing). If you want to return the entire contents of a large table (over 10k records), i.e., querying without a predicate, that still takes a while (10+ seconds), but that is expected.
There is currently a bit of occasional funky behavior and console warnings when deleting relational records. This is because cascading deletes are not yet implemented. You would need to manually delete from the last descendant up to avoid warnings. During my testing, these did not break the application. We will be adding cascading delete functionality shortly to bring this to feature parity with the other adapters.
Please let us know in this thread if you run into issues, and we will get them resolved ASAP. We will include this adapter in the stable release soon but wanted to give you a chance to test and benchmark early to ensure that your performance needs are met.
Here are the installation instructions:
(note: AsyncStorage is still a required dep, as other parts of Amplify continue to rely on it)
yarn add aws-amplify@rn-sqlite amazon-cognito-identity-js@rn-sqlite aws-amplify-react-native@rn-sqlite react-native-sqlite-storage @react-native-async-storage/async-storage
# to ensure we're not left with any of the previous versions
rm -rf node_modules yarn.lock package-lock.json
yarn
npx pod-install
In order to restore the current/async-storage version:
yarn remove react-native-sqlite-storage
yarn add aws-amplify@latest amazon-cognito-identity-js@latest aws-amplify-react-native@latest
# to ensure we're not left with any of the previous versions
rm -rf node_modules yarn.lock package-lock.json
yarn
npx pod-install
Much appreciated @iartemiev - will start building asap and let you know our results. Thanks again
Before opening, please confirm:
JavaScript Framework
React Native
Amplify APIs
DataStore
Amplify Categories
api
Environment information
Describe the bug
We are porting our AppSync/GraphQL based app to DataStore and what we are finding is that queries on the cached database are very slow. We are also finding that pagination of the queries makes no difference to the performance of the queries.
There are many tables with the largest table having about 8,000 records and most other tables having less than 1000 records. There are only 5 tables/models that have data.
Whether we are connected to the network or working offline it makes no difference to the performance.
In issue #6994 it talks of a first time slowing in performance, but we see consistently slow performance.
I have recorded a video of the performance we are seeing - both online and offline - to give you an idea of the problem we are facing.
https://user-images.githubusercontent.com/6362888/120909860-f74b7180-c6bc-11eb-97dc-300f8d19ab4d.mp4
Expected behavior
The database is local so we should be seeing almost instant responses to our queries. The performance of our existing AppSync GraphQL cache gives us almost instant responses so we expect the same or better performance than the existing AppSync GraphQL database.
Reproduction steps
Not sure how to replicate this as the database I have has 48 models and 200 GSI's and takes about 6 hours of incremental (manual) pushes to deploy into a new environment.
Code Snippet
Log output
aws-exports.js
No response
Manual configuration
No response
Additional configuration
No response
Mobile Device
iPhone 7
Mobile Operating System
14.5
Mobile Browser
Not relevant
Mobile Browser Version
No response
Additional information and screenshots
No response