aws-amplify / amplify-category-api

The AWS Amplify CLI is a toolchain for simplifying serverless web and mobile development. This plugin provides functionality for the API category, allowing for the creation and management of GraphQL and REST based backends for your amplify project.
https://docs.amplify.aws/
Apache License 2.0
88 stars 75 forks source link

DataStore with syncExpression against SGI still using scans, and too many full re-syncs #718

Open guy-a opened 2 years ago

guy-a commented 2 years ago

JavaScript Framework

React

Amplify APIs

DataStore

Amplify Categories

api

Environment information

System: OS: macOS 12.5 Shell: 5.8.1 - /bin/zsh Binaries: Node: 14.19.1 - /usr/local/bin/node npm: 6.14.16 - /usr/local/bin/npm Browsers: Chrome: 103.0.5060.134 Safari: 15.6 npmPackages: @aws-amplify/ui-react: ^3.3.0 => 3.3.0 @aws-amplify/ui-react-internal: undefined () @aws-amplify/ui-react-legacy: undefined () @testing-library/jest-dom: ^5.16.4 => 5.16.4 @testing-library/react: ^13.3.0 => 13.3.0 @testing-library/user-event: ^13.5.0 => 13.5.0 aws-amplify: ^4.3.30 => 4.3.30 react: ^18.2.0 => 18.2.0 react-dom: ^18.2.0 => 18.2.0 react-scripts: 5.0.1 => 5.0.1 web-vitals: ^2.1.4 => 2.1.4 npmGlobalPackages: @aws-amplify/cli: 8.4.0 corepack: 0.10.0 n: 8.2.0

Describe the bug

DataStore uses scans even though syncExpression is defined against the GSI. I have created a new amplify project with minimal model exactly as defined in the docs

This is the syncExpression being used, there are 9 User(s) with lastName of 'smith' in the database.

syncExpression(User, p => p.lastName('eq', 'smith').createdAt('gt', '2020-10-10')),

Too many full re-syncs

fullSyncInterval and DeltaSyncTTL are reloading all of the models with all of their data to the client, using scan&query and only scan respectively. Beside the problem of using scans, I do not need every client to reload all of its data with every fullSyncInterval (default to 24 hours), I intend to set fullSyncInterval to a full year.

Same as above, I do not want a full re-sync of all the data, every time the user refresh the browser and DeltaSyncTTL has passed (default to 30 minutes)

Expected behavior

Reproduction steps

Code Snippet

// Model from the amplify docs
input AMPLIFY { globalAuthRule: AuthRule = { allow: public } } # FOR TESTING ONLY!

type User @model {
  id: ID!
  firstName: String!
  lastName: String! @index(name: "byLastName", sortKeyFields: ["createdAt"])
  createdAt: AWSDateTime!
}
// syncExpression used
syncExpression(User, p => p.lastName('eq', 'smith').createdAt('gt', '2020-10-10')),

Log output

No response

aws-exports.js

const awsmobile = {
    "aws_project_region": "us-east-1",
    "aws_appsync_graphqlEndpoint": "https://XXXXXXXXXXX.appsync-api.us-east-1.amazonaws.com/graphql",
    "aws_appsync_region": "us-east-1",
    "aws_appsync_authenticationType": "API_KEY",
    "aws_appsync_apiKey": "da2-XXXXXXXXXXXXXXXXXXXX",
    "aws_cognito_identity_pool_id": "us-east-1:XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXX",
    "aws_cognito_region": "us-east-1",
    "aws_user_pools_id": "us-east-1_XXXXXXXX",
    "aws_user_pools_web_client_id": "XXXXXXXXXXXXXXXXXX",
    "oauth": {},
    "aws_cognito_username_attributes": [],
    "aws_cognito_social_providers": [],
    "aws_cognito_signup_attributes": [
        "EMAIL"
    ],
    "aws_cognito_mfa_configuration": "OFF",
    "aws_cognito_mfa_types": [
        "SMS"
    ],
    "aws_cognito_password_protection_settings": {
        "passwordPolicyMinLength": 8,
        "passwordPolicyCharacters": []
    },
    "aws_cognito_verification_mechanisms": [
        "EMAIL"
    ]
};
guy-a commented 2 years ago

Also, in the Amplify DataStore best practices, under "Sync configuration with base and delta queries" a "delta query" is being mentioned. I could not find any more info regarding that matter. What is the meaning of this "delta query"?

chrisbonifacio commented 2 years ago

Hi @guy-a 👋 thanks for raising this issue. I will reach out to the data team and see if this is expected behavior and follow up with some feedback.

By "delta query", we mean the graphql query DataStore performs to pull any changes that were missed from a Delta Table since the last sync.

From the docs:

The Sync Engine will run a GraphQL query on first start that hydrates the Storage Engine from the network using a Base Query. This defaults to a limit of 100 items at a time and will paginate through up to 1000 items. It will then store a Last Sync Time and each time the device goes from an offline to online state, it will use this as an argument in a Delta Query. When AppSync receives this Last Sync Time in its argument list it will only returned the changes that have been missed by pulling items in a Delta Table.

guy-a commented 2 years ago

Hi @chrisbonifacio, thanks for your reply. I'm looking forward to your feedback.

guy-a commented 2 years ago

Which part of my bug is a feature request? Is it not possible for DataStore to use a query? Because that's not what I've read else were i.e.

I've just did some more tests on different regions and the result is the same a query and a scan are being used together.

Btw, I did multiple tests, and read all the relevant issues before opening the bug.

guy-a commented 2 years ago

It’s possible that this is partially the result of a regression bug from when you switched to GraphQL Transformer V2. I have an older amplify app in production which is still using the old transformers and it’s doing a query for the base sync. All other issues are the same.

Assuming I want to continue to use DataStore, but without any scans on the DDB, are pipeline resolvers the way to go?

guy-a commented 2 years ago

Saw your update to the generated VTL resolvers in the new amplify cli v 10.0.0, but I don’t think it’s good enough for this specific issue. The lastSync in the VTL is indeed one of the problematic areas, but this fix won’t solve the problem IMHO. (Also, hardcoding deltaTTL to 30 minutes might be problematic) In all of my generated VTL resolvers, there are always 2 sync resolvers that are fetching from the DB, 2 for the same model: Query.syncUsers.preAuth.1.req.vtl - will sometimes do a “query” and sometimes a “scan” Query.syncUsers.req.vtl - will always do a “scan”, irrespectively of what happened before it in the pipeline

That’s why I am seeing query+scan or scan+scan.