aws-amplify / amplify-category-api

The AWS Amplify CLI is a toolchain for simplifying serverless web and mobile development. This plugin provides functionality for the API category, allowing for the creation and management of GraphQL and REST based backends for your amplify project.
https://docs.amplify.aws/
Apache License 2.0
90 stars 77 forks source link

Duplicate GSIs Created when using relationships and Secondary indexes #2955

Open MattWlodarski opened 2 days ago

MattWlodarski commented 2 days ago

Environment information

System:
  OS: macOS 14.6.1
  CPU: (16) x64 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
  Memory: 8.19 GB / 64.00 GB
  Shell: /bin/zsh
Binaries:
  Node: 22.9.0 - /usr/local/bin/node
  Yarn: undefined - undefined
  npm: 10.8.3 - /usr/local/bin/npm
  pnpm: undefined - undefined
NPM Packages:
  @aws-amplify/auth-construct: 1.3.0
  @aws-amplify/backend: 1.2.0
  @aws-amplify/backend-auth: 1.1.3
  @aws-amplify/backend-cli: 1.2.5
  @aws-amplify/backend-data: 1.1.3
  @aws-amplify/backend-deployer: 1.1.0
  @aws-amplify/backend-function: 1.3.4
  @aws-amplify/backend-output-schemas: 1.2.0
  @aws-amplify/backend-output-storage: 1.1.1
  @aws-amplify/backend-secret: 1.1.0
  @aws-amplify/backend-storage: 1.1.2
  @aws-amplify/cli-core: 1.1.2
  @aws-amplify/client-config: 1.3.0
  @aws-amplify/deployed-backend-client: 1.4.0
  @aws-amplify/form-generator: 1.0.1
  @aws-amplify/model-generator: 1.0.5
  @aws-amplify/platform-core: 1.0.7
  @aws-amplify/plugin-types: 1.2.1
  @aws-amplify/sandbox: 1.2.0
  @aws-amplify/schema-generator: 1.2.1
  aws-amplify: 6.6.3
  aws-cdk: 2.155.0
  aws-cdk-lib: 2.155.0
  typescript: 5.5.4
AWS environment variables:
  AWS_STS_REGIONAL_ENDPOINTS = regional
  AWS_NODEJS_CONNECTION_REUSE_ENABLED = 1
  AWS_SDK_LOAD_CONFIG = 1
No CDK environment variables

Describe the feature

Feature Request: Optimize GSI Creation for Relationships in Amplify Summary In my current schema setup, I'm experiencing resource duplication in DynamoDB due to the creation of multiple Global Secondary Indexes (GSIs) with the same Partition Key. I believe there's an opportunity to enhance the way Amplify handles relationships and GSIs, specifically when querying by a related entity.

Current Schema Here is the relevant part of my schema: export const OutcomesRecordSchema = a .model({ createdAt: a.string(), // required createdBy: a.id().required(), caseId: a.id().required(), // relationships case: a.belongsTo('Case', 'caseId'), }) .authorization((allow) => [allow.authenticated('userPools')]) .secondaryIndexes((index) => [ index('caseId').sortKeys(['createdAt']).name('byCase').queryField('outcomesRecordByCase'), ]); Problem As shown, I have a relationship with the Case table, and I want to query outcomes by caseId. However, this setup creates two GSIs in DynamoDB that share the same Partition Key. Each duplicate Partition Key leads to unnecessary resource consumption, which is inefficient.

Proposed Improvement I propose that Amplify should automatically add the GSI generated when a relationship is defined to the client (const client = generateClient();). This would eliminate the need to manually create a secondary index for querying, thereby reducing resource duplication.

Additionally, the auto-generated GSI currently lacks a Sort Key. It would be beneficial to provide the ability to define a Sort Key for this GSI directly within the schema, allowing for more flexible querying without creating redundant GSIs.

Use case

Use Case Resource Efficiency:

Scenario: Having multiple GSIs with the same Partition Key leads to resource duplication in DynamoDB. Benefit: Utilizing existing GSIs would reduce operational costs and improve resource management. Simplified Schema:

Scenario: Defining additional secondary indexes for querying complicates the schema. Benefit: Leveraging auto-generated GSIs would streamline schema definitions and enhance maintainability. Enhanced Query Capability:

Scenario: Needing to query by related entity IDs (e.g., caseId) while sorting by attributes like createdAt. Benefit: Allowing Sort Keys on auto-generated GSIs would enable more complex queries without redundancy. Improved Workflow:

Scenario: Developers need to make quick schema changes without complicating GSI management. Benefit: Simplified GSI handling would enable faster development cycles and easier collaboration.

ykethan commented 1 day ago

Hey,👋 thanks for raising this! I'm going to transfer this over to our API repository for better assistance 🙂