aws-amplify / amplify-js

A declarative JavaScript library for application development using cloud services.
https://docs.amplify.aws/lib/q/platform/js
Apache License 2.0
9.41k stars 2.11k forks source link

Initial syncing take more time on the datastore #9471

Closed Dilip-Solanki-Logistic-Infotech closed 2 years ago

Dilip-Solanki-Logistic-Infotech commented 2 years ago

Before opening, please confirm:

JavaScript Framework

React

Amplify APIs

DataStore

Amplify Categories

No response

Environment information

``` System: OS: Linux 4.15 Ubuntu 18.04.6 LTS (Bionic Beaver) CPU: (6) x64 Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz Memory: 1.69 GB / 15.60 GB Container: Yes Shell: 4.4.20 - /bin/bash Binaries: Node: 16.13.0 - ~/.nvm/versions/node/v16.13.0/bin/node npm: 8.1.0 - ~/.nvm/versions/node/v16.13.0/bin/npm Browsers: Chrome: 97.0.4692.71 Firefox: 95.0.1 npmPackages: @amcharts/amcharts4: ^4.10.18 => 4.10.23 @aws-amplify/ui-react: 2.1.5 => 2.1.5 @aws-amplify/ui-react-internal: undefined () @aws-amplify/ui-react-legacy: undefined () @babel/cli: ^7.12.10 => 7.16.7 @babel/core: ^7.12.10 => 7.16.7 @babel/preset-env: ^7.12.11 => 7.16.7 @date-io/moment: ^1.3.13 => 1.3.13 @material-ui/core: ^4.12.3 => 4.12.3 @material-ui/icons: ^4.11.2 => 4.11.2 @material-ui/lab: ^4.0.0-alpha.60 => 4.0.0-alpha.60 @material-ui/pickers: ^3.3.10 => 3.3.10 @material-ui/styles: ^4.11.4 => 4.11.4 amazon-quicksight-embedding-sdk: ^1.18.0 => 1.18.0 amcharts4-example-100%-stacked-column-chart: 0.1.0 amcharts4-example-adding-live-data: 0.1.0 amcharts4-example-amcharts3: 0.1.0 amcharts4-example-animating-along-the-line-series: 0.1.0 amcharts4-example-bar-chart-race: 0.1.0 amcharts4-example-base: 0.1.0 amcharts4-example-bent-gantt-chart: 0.1.0 amcharts4-example-bubble-chart: 0.1.0 amcharts4-example-candlestick-chart: 0.1.0 amcharts4-example-changing-series-apppearance: 0.1.0 amcharts4-example-changing-tree-map-data: 0.1.0 amcharts4-example-chord-diagram: 0.1.0 amcharts4-example-chord-diagram-non-ribbon: 0.1.0 amcharts4-example-chord-friends-kisses: 0.1.0 amcharts4-example-clock: 0.1.0 amcharts4-example-column-chart-with-axis-break: 0.1.0 amcharts4-example-column-chart-with-images-as-bullets: 0.1.0 amcharts4-example-columns-with-pies-inside: 0.1.0 amcharts4-example-countries-morphing-to-pie-chart: 0.1.0 amcharts4-example-curved-column-chart: 0.1.0 amcharts4-example-custom-shape-chart: 0.1.0 amcharts4-example-cylinder-chart: 0.1.0 amcharts4-example-data-grouping-50K: 0.1.0 amcharts4-example-date-based-radar: 0.1.0 amcharts4-example-day-night-map: 0.1.0 amcharts4-example-donut-chart: 0.1.0 amcharts4-example-drag-and-change-values: 0.1.0 amcharts4-example-dragging-pie-slices: 0.1.0 amcharts4-example-drill-down-map: 0.1.0 amcharts4-example-drill-down-tree-map: 0.1.0 amcharts4-example-dumbbell-plot: 0.1.0 amcharts4-example-dumbbell-plot-horizontal: 0.1.0 amcharts4-example-duration-axis: 0.1.0 amcharts4-example-error-chart: 0.1.0 amcharts4-example-fill-between-lines-chart: 0.1.0 amcharts4-example-fishbone-timeline: 0.1.0 amcharts4-example-force-directed-creating-links: 0.1.0 amcharts4-example-force-directed-network: 0.1.0 amcharts4-example-force-directed-tree: 0.1.0 amcharts4-example-force-directed-tree-expandable: 0.1.0 amcharts4-example-funnel-chart: 0.1.0 amcharts4-example-funnel-chart-horizontal: 0.1.0 amcharts4-example-gantt-chart: 0.1.0 amcharts4-example-gauge-with-bands: 0.1.0 amcharts4-example-geo-heat-map: 0.1.0 amcharts4-example-heat-map: 0.1.0 amcharts4-example-heat-map-circles: 0.1.0 amcharts4-example-heat-map-radar: 0.1.0 amcharts4-example-horizontally-stacked-axes: 0.1.0 amcharts4-example-infinity-chart: 0.1.0 amcharts4-example-lazy-loading: 0.1.0 amcharts4-example-line-different-ups-downs: 0.1.0 amcharts4-example-map-line-gauge-mix: 0.1.0 amcharts4-example-morphing-countries: 0.1.0 amcharts4-example-multiple-axes-date-based-chart: 0.1.0 amcharts4-example-multiple-series-map-chart: 0.1.0 amcharts4-example-non-chart-usage: 0.1.0 amcharts4-example-ohlc-chart: 0.1.0 amcharts4-example-packed-circles: 0.1.0 amcharts4-example-pictorial-bar-chart: 0.1.0 amcharts4-example-pictorial-chart: 0.1.0 amcharts4-example-pictorial-stacked-chart: 0.1.0 amcharts4-example-pictorial-stacked-chart-horizontal: 0.1.0 amcharts4-example-polar-area-chart: 0.1.0 amcharts4-example-population-pyramid: 0.1.0 amcharts4-example-pyramid-chart: 0.1.0 amcharts4-example-radar-chart-with-axis-break: 0.1.0 amcharts4-example-radar-timeline-chart: 0.1.0 amcharts4-example-radial-bar-chart: 0.1.0 amcharts4-example-real-time-data-sorting: 0.1.0 amcharts4-example-road-chart: 0.1.0 amcharts4-example-rotating-globe: 0.1.0 amcharts4-example-rotating-globe-with-circles: 0.1.0 amcharts4-example-sankey-diagram-with-animated-bullets: 0.1.0 amcharts4-example-semi-circle-donut-chart: 0.1.0 amcharts4-example-serpentine-gantt-horizontal: 0.1.0 amcharts4-example-serpentine-step-line: 0.1.0 amcharts4-example-serpentine-timeline: 0.1.0 amcharts4-example-simple-3D-pie-chart: 0.1.0 amcharts4-example-simple-bar-chart: 0.1.0 amcharts4-example-simple-column-chart: 0.1.0 amcharts4-example-simple-gauge: 0.1.0 amcharts4-example-simple-line-chart: 0.1.0 amcharts4-example-simple-map-chart: 0.1.0 amcharts4-example-simple-pie-chart: 0.1.0 amcharts4-example-simple-radar-chart: 0.1.0 amcharts4-example-simple-sankey-diagram: 0.1.0 amcharts4-example-simple-tree-map: 0.1.0 amcharts4-example-spiral-bar-chart: 0.1.0 amcharts4-example-spiral-chart: 0.1.0 amcharts4-example-spiral-gantt-chart: 0.1.0 amcharts4-example-stacked-3D-column-chart: 0.1.0 amcharts4-example-stacked-area-radar-chart: 0.1.0 amcharts4-example-stacked-column-chart: 0.1.0 amcharts4-example-stadium-track-chart: 0.1.0 amcharts4-example-step-count-chart: 0.1.0 amcharts4-example-step-line-chart: 0.1.0 amcharts4-example-step-line-no-risers-chart: 0.1.0 amcharts4-example-stock-chart: 0.1.0 amcharts4-example-stock-comparing-values: 0.1.0 amcharts4-example-sunburst: 0.1.0 amcharts4-example-syncing-cursors-and-zoom: 0.1.0 amcharts4-example-timeline: 0.1.0 amcharts4-example-triangle-column-chart: 0.1.0 amcharts4-example-variable-angle-radar-chart: 0.1.0 amcharts4-example-variable-height-3D-pie-chart: 0.1.0 amcharts4-example-variable-radius-pie-chart: 0.1.0 amcharts4-example-venn-diagram: 0.1.0 amcharts4-example-venn-diagram-with-patterns: 0.1.0 amcharts4-example-vertical-sankey-diagram: 0.1.0 amcharts4-example-vertically-stacked-axes: 0.1.0 amcharts4-example-waterfall-chart: 0.1.0 amcharts4-example-word-cloud: 0.1.0 amcharts4-example-word-cloud-changing-data: 0.1.0 amcharts4-example-work-hours-map: 0.1.0 amcharts4-example-xy-error-chart: 0.1.0 autosuggest-highlight: ^3.1.1 => 3.2.0 aws-amplify: 4.3.11 => 4.3.11 axios: ^0.22.0 => 0.22.0 (0.21.4) babel-plugin-import: ^1.13.3 => 1.13.3 customize-cra: ^1.0.0 => 1.0.0 cypress: ^9.0.0 => 9.2.0 cypress-localstorage-commands: ^1.4.0 => 1.6.1 eslint-plugin-cypress: ^2.11.2 => 2.12.1 eslint-plugin-react-hooks: ^4.2.0 => 4.3.0 google-libphonenumber: ^3.2.21 => 3.2.26 i18next: ^19.8.3 => 19.9.2 i18next-xhr-backend: ^3.2.2 => 3.2.2 jss-rtl: ^0.3.0 => 0.3.0 md5: ^2.3.0 => 2.3.0 moment: ^2.29.1 => 2.29.1 notistack: ^1.0.1 => 1.0.10 react: ^17.0.2 => 17.0.2 react-app-rewired: ^2.1.8 => 2.1.11 react-csv: ^2.0.3 => 2.2.1 react-dom: ^17.0.2 => 17.0.2 react-draggable: ^4.4.3 => 4.4.4 react-i18next: ^11.7.3 => 11.15.3 react-router-dom: ^6.0.2 => 6.2.1 react-scripts: 5.0.0 => 5.0.0 simple-line-chart: 0.0.0 uuid: ^8.3.1 => 8.3.2 (3.4.0, 3.3.2) npmGlobalPackages: @aws-amplify/cli: 7.6.7 corepack: 0.10.0 firebase-tools: 10.0.1 npm: 8.1.0 serve: 13.0.2 ```

Describe the bug

When logged in the started the initial syncing on my app. I used the syncExpression for some models. But one model (OrderEvent) takes more time to sync. Because it is synced sometimes with one or two records only. I put some screenshots for that.

Sometimes takes 5 to 10 minutes to complete sync.

Total Items in OrderEvent: 1,436,618

Expected behavior

I want to solve the issue. But, I don't why it happens only in a particular model.

Reproduction steps

Add the syncExpression with the condition. Open the Inspect bar. Click the Network tab. See the syncOrderEvents.

Code Snippet

DataStore.configure({
   syncExpressions: [
        ...
        syncExpression(OrderEvent, () => {
            return orderEvent => orderEvent.timestamp('ge', startDate)
        }
    })
    ...

Log output

``` // Put your logs below this line ```

aws-exports.js

No response

Manual configuration

No response

Additional configuration

No response

Mobile Device

No response

Mobile Operating System

No response

Mobile Browser

No response

Mobile Browser Version

No response

Additional information and screenshots

image image image

Dilip-Solanki-Logistic-Infotech commented 2 years ago

@iartemiev Any way to solve this issue?

PeteDuncanson commented 2 years ago

@Dilip-Solanki-Logistic-Infotech wow, I suspect your sync is scanning the DynamoDB table which has to be done 1Mb at time, with 1.4Mb thats a lot of info. Take a look at switching to get it to use a Query instead https://docs.amplify.aws/lib/datastore/sync/q/platform/js/#advanced-use-case---query-instead-of-scan

Can't say I've used that myself but its meant to help to get you out of a hole like the one you are in.

Cheers

Pete

chrisbonifacio commented 2 years ago

Hi @Dilip-Solanki-Logistic-Infotech 👋 I second @PeteDuncanson's recommendation. That is a lot of data, definitely want to narrow that down to a query and see if you can index on something and then filter by the timestamp.

You can try by adjusting this example from the docs Pete linked above

DataStore.configure({
  syncExpressions: [
    syncExpression(User, () => {
      const lastName = await getLastNameForSync();
      return user => user.lastName('eq', lastName).createdAt('gt', '2020-10-10')
    })
  ]
});
Dilip-Solanki-Logistic-Infotech commented 2 years ago

I need is a record of the last 8 hours. So, No other field to be there like, lastName.

PeteDuncanson commented 2 years ago

@Dilip-Solanki-Logistic-Infotech your other option is to just use GraphQL direct for that data but then you would lose the offline goodness if you go that route. Welcome to my world :)

chrisbonifacio commented 2 years ago

It might be worth a try but I would expect GraphQL to still be slow if the query is still performing a scan rather a query on the DDB table, because it'd still be hitting all items in the table. Not including a query expression will result in the same operation type (Scan vs Query) on the table.

For example, here's the relevant part of the vtl template for a CLI-generated list query:

#if( !$util.isNull($ctx.stash.modelQueryExpression) && !$util.isNullOrEmpty($ctx.stash.modelQueryExpression.expression) )
  $util.qr($ListRequest.put("operation", "Query")) // < -- if query expression is not NULL or Empty (ex. allPostsByAuthor({ author: "Chris" })
  $util.qr($ListRequest.put("query", $ctx.stash.modelQueryExpression))
  #if( !$util.isNull($ctx.args.sortDirection) && $ctx.args.sortDirection == "DESC" )
    #set( $ListRequest.scanIndexForward = false )
  #else
    #set( $ListRequest.scanIndexForward = true )
  #end
#else
  $util.qr($ListRequest.put("operation", "Scan")) // < -- otherwise, performs Scan by default (ex. allPosts())
#end

@Dilip-Solanki-Logistic-Infotech Perhaps another route might be to paginate the results instead? You can use DataStore.query instead of the syncExpression like so:

https://docs.amplify.aws/lib/datastore/data-access/q/platform/js/#predicates

const posts = await DataStore.query(OrderEvent, event => event.timestamp("ge", startDate) , {
  page: 0,
  limit: 100
});

Otherwise, if you'd prefer to use the syncExpression approach in the DataStore configuration, one thing you might be able to do to work some kind of GSI into the schema is set a fuzzy date so that the query doesn't have to hit every item in the table. This might be easier to work in so your logic doesn't need to change as much.

example:

DataStore.configure({
  syncExpressions: [
    syncExpression(OrderEvent, () => {
      return event => event.createdAtFuzzy('eq', '2022-01-12') // or today's date, or today .or() yesterday's dates and then filter further from here
    })
  ]
});

you can add this field to your model to facilitate this

createdAtFuzzy: String!
    @index(name: "byCreatedAtFuzzy", queryField: "orderEventByCreatedAtFuzzy")

Let me know if this helps. Otherwise, I can check with the DataStore team to get some more feedback for you. 🙏

PeteDuncanson commented 2 years ago

@chrisbonifacio I don't think he can use DataStore at all for that amount of data, even paging it within DataStore means it still needs to have it locally so still the same issue, pulling a tonne of data down up front right? The paging is just paging through local data.

GraphQL hit with a GSI would be a good way to in this case, as usual, work out your access pattern need up front and build a GSI or query to match it efficiently. Ha! Listen to me, 18 months of burnt fingers at least I've learnt something :)

chrisbonifacio commented 2 years ago

@PeteDuncanson ah, that's a good point, it won't apply to the sync. The syncExpression with a GSI might be the best approach in a DataStore context then because the expression would apply to base and delta syncs and incoming subs. Otherwise, if that doesn't help to speed up the sync then GQL and GSI would be the way to go.

Dilip-Solanki-Logistic-Infotech commented 2 years ago

@chrisbonifacio @iartemiev Can you please give a simple example as per my requirement? I just need 8 hours records with datastore and syncExpression. My whole project depends on the datastore only.

chrisbonifacio commented 2 years ago

@Dilip-Solanki-Logistic-Infotech I would suggest trying what I mentioned in my previous comment: https://github.com/aws-amplify/amplify-js/issues/9471#issuecomment-1017905677

  1. Add the fuzzy date field/index to your schema and start adding the date to each record when they are created
    OrderEvent {
    # ...other fields
    createdAtFuzzy: String! @index(name: "byCreatedAtFuzzy", queryField: "orderEventByCreatedAtFuzzy")
    }
  2. Set your sync expression to this or similar
DataStore.configure({
  syncExpressions: [
    syncExpression(OrderEvent, () => {
      return event => event
      .createdAtFuzzy('eq', '2022-02-2') // or today's date, or today .or() yesterday's dates and then filter further from here
      .timestamp('ge', startDate)
    })
  ]
});

If you can try this, let us know if you experience better response time.

chrisbonifacio commented 2 years ago

Hi 👋 Closing this as we have not heard back from you. If you are still experiencing this issue and in need of assistance, please feel free to comment and provide us with any information previously requested by our team members so we can re-open this issue and be better able to assist you. Thank you!

Dilip-Solanki-Logistic-Infotech commented 2 years ago

@chrisbonifacio Thanks. I will try to your suggestion.

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there hasn't been any recent activity after it was closed. Please open a new issue for related bugs.

Looking for a help forum? We recommend joining the Amplify Community Discord server *-help channels or Discussions for those types of questions.