aws-amplify / amplify-swift

A declarative library for application development using cloud services.
Apache License 2.0
447 stars 193 forks source link

Do all the data gets downloaded once you subscribe to a model? #2459

Closed afern247 closed 1 year ago

afern247 commented 1 year ago

Hey guys, I'm in the middle of an architecture decision, let's suppose I have a model called: Coins

type Coins {
  id: ID!
  name: String
  historical_data: AWSJSON
}
  1. There are 13000 coins and there will be way more in the future
  2. Each coin historical data is about 330 KB
  3. The coin may hold more than 330 KB of data counting all of the fields, possibly 660KB +

If I subscribe on the client (iPhone), to the model Coins, so the users have access to all 13000 coins using DataStore (for caching), as soon as the subscription kicks in will it download ALL the data from ALL 13000 coins (which might be 2 Gb+) or will it download only the data that I access at the moment? For example Coin1.historical_data?

I guess the other option would be to use DataStore query and fetch the data for a single coin, but this would increase the cost a lot, right?

Can someone give me some insights on the matter with best practices from AWS?

Thank you

drochetti commented 1 year ago

Hi @afern247, thanks for your question.

DataStore is a offline first data solution, so by default all data available in your backend is downloaded (synchronized) to the device. Since it's not a caching mechanism, it won't download/cache data on demand, the synchronization process happens constantly in the background.

That being said, you have quite a bit of data in the historical_data field, so I'll share a few insights with you to help you make the architectural decision:

  1. Although all data is synced by default, you can setup rules in order to synchronize only a subset of the data you need. With the correct implementation between your app state and the backend you could synchronize only the data a particular client needs: https://docs.amplify.aws/lib/datastore/sync/q/platform/ios/#selectively-syncing-a-subset-of-your-data
  2. Like I mentioned, DataStore is an offline-first storage mechanism, therefore it really shines when you need offline access in your application. If that's not the case and the data access patterns and scalability might be affected by the constant syncing process, you can also consider using AppSync directly, via the Amplify.API category.
  3. In case you decide with number 2 but think fetching the historical data too often, you can implement some caching strategy yourself, based on parameters that make sense for your application.
  4. You might also consider organizing your historical data in a time-series schema or even a different storage mechanism that won't result in a big JSON blob that needs to be downloaded at once. This will influence how your application responds as the historical data scales. You can even consider databases optimized for time-series data, such as AWS Timestream, InfluxDB, etc. DynamoDB has an article on that as well: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-time-series.html