DataStore Feedback - Githubissues

Describe the bug As requested in the discord, for more readability, here is our feedback of DataStore:

Two months ago, we were advised to replace our whole backend architecture which was lying on a combination of appsync and apollo to DataStore. The reason? The appsync js library team seems to no longer work on the service and even less on the compatibility between apollo and appsync (issue was more than one years old and many people were waiting for it -> https://github.com/awslabs/aws-mobile-appsync-sdk-js/issues/369). We had no choice but to migrate to DataStore, because of the risk that apollo no longer supports its old version 2.4.

It’s been two months that we are trying to make DataStore usable for a production environment and we seem to be quite far from it frankly. Here is a non-exhaustive list of the reasons why

Limit of 50 subscriptions which is equivalent to having 16 tables syncable to DataStore. This is very few considering that the tables constructed by GraphQL transform are an SQL-like architecture. While this limitation also existed for Appsync alone, we had at least the freedom of choosing the subscriptions we wanted to apply as well as combining some of them together to reduce their numbers (insert and update can be the same subscription). Here one model means 3 subscriptions, we have no choice but to have them all. https://github.com/aws-amplify/amplify-js/issues/5050
Impossible to delete objects that have a one-to-many relationship. While this problem seems to have been resolved by the cli team, we’re still waiting for the js team since March. https://github.com/aws-amplify/amplify-js/issues/5088
The subscriptions are firing when syncing to the cloud. That means that if you have 5000 elements in your dynamo tables, when first connecting, you will be welcomed with 5000 subscriptions firing in your browser. This could be handled by knowing when the syncing is actually finished, but this feature request was made in January and still no news. https://github.com/aws-amplify/amplify-js/issues/4808
DataStore syncing is slow on web and even slower on mobile.
https://github.com/aws-amplify/amplify-js/issues/5592 The queries are done 100 by 100, so if you must fetch 1000 elements from a table this means you have to perform 10 queries. This could be easily solved by allowing to override the {limit: 100} value used when calling the GraphQl list queries.

Isolated these problems could seem insignificant, but, combined there are really getting in the way of a real-time production use of DataStore. We are even considering disabling DataStore and only use GraphQl transform and thus sacrificing the offline capability which was the whole point of moving to DataStore.

If only we could have some ETAs for these problems or if we could actually see some progress made on DataStore this would really reassure us.

I should maybe add that we have 2 frontends: React and React Native (with expo)

(And it's not easy to use datastore on both with only the amplify pull if we only want to allow edition inside one of them (to have the codegen and models). Both frontend are in the same monorepo. But this can be work around easily with a script that copy from one to another)

Hi @SebSchwartz

Thank you for sharing this feedback, it is definitely very useful.

For the issues you've listed I believe we have solutions coming out soon (we're aiming for the next few weeks assuming no issues come up).

Limit of 50 subscriptions There is an in-progress rollout that ups this limit to 100 subscriptions. While some customers might still need more than 33 models, we hope that this aliviates the problem for a lot of them while we think on ways to make this configurable on the Data Store (e.g. opting-out of realtime updates for some models).

Impossible to delete objects that have a one-to-many relationship We are actively working on this and will have news soon. Updates will be posted in the issue you linked: https://github.com/aws-amplify/amplify-js/issues/5088

The subscriptions are firing when syncing to the cloud We are tracking and working on this as part of a task to make the internals of the datastore more observable via the Hub. The idea is to emit events for things like "sync_completed".

DataStore syncing is slow on web and even slower on mobile. We've implemented some improvements on this regard (e.g. doing a "batch save" to the underlying storage mechanism when processing items coming from graphql, instead of one by one).

Also, as of today, there are some additional optional configurations you can tweak to, for example, increase the limit parameter when querying. Appropriate documentation will be added to this section: https://docs.amplify.aws/lib/datastore/conflict/q/platform/js#optional-configurations

syncPageSize is the one that will help you to fetch more data per request.

Amplify.configure({
  DataStore: {
    fullSyncInterval: 60 * 24 * 15, // 15 days
    syncPageSize: 1000, // "limit" sent to graphql 
    maxRecordsToSync: 2000000,
  }
});

We appreciate the feedback, thanks! 😄

@manueliglesias Also, we want to upgrade our library version to have the latests features but we are blocked with this issue: https://github.com/aws-amplify/amplify-js/issues/5814

As already said in https://github.com/aws-amplify/amplify-js/issues/5820 it would be great to have a changelog so we can check if issue can be related to changes or if it's something else and so we can try doing more digging.

Hi @SebSchwartz

Thank you for sharing this feedback, it is definitely very useful.

For the issues you've listed I believe we have solutions coming out soon (we're aiming for the next few weeks assuming no issues come up).

Limit of 50 subscriptions There is an in-progress rollout that ups this limit to 100 subscriptions. While some customers might still need more than 33 models, we hope that this aliviates the problem for a lot of them while we think on ways to make this configurable on the Data Store (e.g. opting-out of realtime updates for some models).

Impossible to delete objects that have a one-to-many relationship We are actively working on this and will have news soon. Updates will be posted in the issue you linked: #5088

The subscriptions are firing when syncing to the cloud We are tracking and working on this as part of a task to make the internals of the datastore more observable via the Hub. The idea is to emit events for things like "sync_completed".

DataStore syncing is slow on web and even slower on mobile. We've implemented some improvements on this regard (e.g. doing a "batch save" to the underlying storage mechanism when processing items coming from graphql, instead of one by one).

Also, as of today, there are some additional optional configurations you can tweak to, for example, increase the limit parameter when querying. Appropriate documentation will be added to this section: https://docs.amplify.aws/lib/datastore/conflict/q/platform/js#optional-configurations

syncPageSize is the one that will help you to fetch more data per request.
Amplify.configure({
  DataStore: {
    fullSyncInterval: 60 * 24 * 15, // 15 days
    syncPageSize: 1000, // "limit" sent to graphql 
    maxRecordsToSync: 2000000,
  }
});
We appreciate the feedback, thanks! 😄

May I know whats the fullSyncInterval for?

Hi @nubpro

May I know whats the fullSyncInterval for?

Yeah, this is the number of minutes before a baseQuery runs again:

..., as well as a custom interval in minutes which is an override of the default 24 hour “base query” which runs as part of the Delta Sync process.

It is very unfortunate that the appsync team cannot maintain the sdk. Lots of frustration online about this since people dont generally want to rebuild an entire app structure.

Moving from graphQL apollo to datastore does not seem to be an option if you need to update resolvers to do batch mutations, query @searchable location, refetch queries etc.

I am very disappointed with the direction of abandoning the appsync sdk offline capabilities.

@SebSchwartz thank you for sharing the useful feedback. I will stick with graphql apollo and no offline capabilities even though I too was using appsync sdk for this offline capability purpose alone.

I think this type of offline first local datastore db has become the white whale of modern app development and I definitely think the amplify team have put too many eggs in the datastore basket. Whilst I believe it has it's place for simple PARTS of a model within a complex platform it currently takes over the whole application with the belief you want to use datastore for every entity which it is just not capable of providing for. If it was this easy to create an "always local db first" platform then most of the meteor js team wouldn't be working on apollo instead (which is going in the opposite direction as amplify)

For a complex process such as a purchase flow or order/booking flow I do not want to have to work around something like datastore and models for such complex flows just do fit within the limits of datastore. It would be far more useful to be able to choose datastore on a per entity basis. That way if I have blog posts, or news I can use datastore but for more mission critical things I don't want datastore anywhere near it.

Updating business logic on all clients and doing data migrations is a complete nightmare if you have rogue devices not updated and trying to sync an old model with bad logic. It's just not viable for a scalable platform to have to manage such deviations in behaviour because you have offloaded your business logic to live on the device prior to the write to the local datastore. You may as well use pouchdb/couchdb in such circumstances? Why bother with graphql and appsync at all? At the moment to intercept mutation logic at the point of sync is a nightmare because I cannot control which mutation is used to sync the write on the server so I am limited to overriding the VTL templates which are severely restricted in what type of logic I can run server side.

This inability to control the mutation effectively on the server kind of means you can't use it for mission critical stuff because without some elaborate measures in place I can't stop the user hacking their device and app and (or injecting javascript on a website) and updating a datastore row directly in the localstore (say to change an order size to 100 items instead of 1) and then deltasync will dutifully send that to the server update VTL which, correct me if I am wrong, is not really powerful enough to perform proper checks on the data? The version conflict resolvers only run on a conflict between two writes which I am pretty sure could be bypassed by setting the _version to 999999 etc. The same applies to a create mutation, If someone bypasses the client logic for the create by inserting a row directly then how do I double check that on the server (say with a call to a 3rd party api) with the current create mutations? I can think of probably more scenarios that I just means datastore should not be a consideration (such as changing id's client side on a chat app to send unauthorized message to other users etc.)

I think the original appsync client was a far better way to go because it allowed you more fine grained control and would serialize and replay the actual mutations (rather than disconnecting you from this via the datastore api). This meant you could call any custom mutation and sync it later. That mutation could be a lambda with proper complex logic and validation checking, at which point it could reject the row and inform the client app.

I think putting so much effort into datastore to the detriment of other parts of the platform such as appsync libraries and the woefully buggy analytics/notifications parts of the platform are really souring my experiences of what should otherwise be a brilliant platform. It kind of self-limits the platform to being only useful for toybox applications and undermines the rest of the hardwork done on the platform. Datastore will never be powerful enough for anything more than the simplest of use cases. I have used nearly every type of this offline first approach such as realm, couchbase etc. and none of them are useable beyond simple scenarios but you can work around them because you only put through what need into them. Appsync sdk was the first that had the best of both worlds due to being able to simulate any mutations with an optimistic response but then still calling any mutation you like, but with datastore taking centre stage the moment you install it, applying to every entity, and being a nightmare to work around it offers nothing above these other platforms but also means you can't pick and choose when to use it.

I guess my main point is - Just make it more optional and less intrusive on the entire model (maybe add a @datastore directive?) and spend some more time giving the other parts of the platform some love. This type of offline first database cannot cover enough real world scenarios to fit everyone so should take less of a centre stage within the amplify eco-system. It has it's place but really should be de-prioritised against the other functionalities of the amplify platform.

@jcbdev thanks for the feedback. DataStore is not the only part of the Amplify platform that is prioritized, and it is currently optional. We prioritize functionality based on customer reactions and demand in the community such as GitHub and Discord. It won't be the right choice for all apps but we have many customers very successful and asking for more features including fully running DataStore with more features in their apps as well as hybrid scenarios as you've outlined above. In fact one of your asks is related to a syncable/cloud-only/local-only model feature we're looking at for 2021.

A request though - this thread is closed and some of your comments are a bit non-specific. As you come across new asks could you please open a new issue as a feature request with your specific use cases that you aren't able to accomplish (either in DataStore or other parts of the platform) so that they can be accounted for in roadmap discussions? Thanks.

aws-amplify / amplify-js

DataStore Feedback #5764