Support for SQL based SDL in Gen 2.

kekami commented 3 months ago

Environment information

System:
  OS: macOS 14.5
  CPU: (10) arm64 Apple M1 Max
  Memory: 176.88 MB / 32.00 GB
  Shell: /bin/zsh
Binaries:
  Node: 20.14.0 - ~/.nvm/versions/node/v20.14.0/bin/node
  Yarn: 1.22.17 - /opt/homebrew/bin/yarn
  npm: 10.7.0 - ~/.nvm/versions/node/v20.14.0/bin/npm
  pnpm: 8.15.6 - /opt/homebrew/bin/pnpm
NPM Packages:
  @aws-amplify/backend: 1.0.4
  @aws-amplify/backend-cli: 1.1.0
  aws-amplify: 6.3.8
  aws-cdk: 2.147.2
  aws-cdk-lib: 2.147.2
  typescript: 5.5.2
AWS environment variables:
  AWS_SDK_LOAD_CONFIG = 1
  AWS_STS_REGIONAL_ENDPOINTS = regional
  AWS_NODEJS_CONNECTION_REUSE_ENABLED = 1
No CDK environment variables

Description

I appreciate the direction and development experience that Amplify Gen 2 offers. Features like personal sandboxes, TypeScript support, and CDK extensibility are excellent. However, the current Data Schema Builder in Amplify Gen 2 limits the options available to developers when defining their GraphQL APIs.

By introducing support for GraphQL SDL for SQL based schemas, Amplify can offer a familiar, more flexible and developer-friendly approach to schema design. I've detailed the challenges I've encountered while working with data schemas coupled to a relational database in this Discord discussion.

Key Issues with the Current Data Schema Builder:

Verbose Configuration: Configuring authentication, relationships, and renaming fields is excessively verbose compared to using AppSync directives. The configurations are split into different callbacks, making it difficult to get a comprehensive overview of a model's setup.
Limited API Surface Control: The Model Schema automatically exposes all database tables and columns without offering control over the API surface. This results in an unnecessary 1-to-1 mapping of DB rows to the GraphQL API. I prefer a tailored API that meets the specific needs of the client, allowing for custom naming conventions (e.g., snake_case to camelCase). The lack of flexibility in controlling the exposed API surface is a significant limitation.
Lack of Custom Field Resolvers: One of GraphQL's strengths is its ability to map each field to a resolver, enabling the client to request specific data. The Model Schema's approach of merely returning DB rows turns GraphQL into a glorified REST API. Custom field resolvers, such as the example below, are not possible:
```
type Post {
 id: String!
 title: String!
 localizedTitle(locale: String): String
 content: String!
}
```
Limitations of Custom Types: Custom types cannot be used in arguments, preventing the creation of queries with the same structure as generated ones. Additionally, custom types that reference models are not supported, hindering the creation of custom list queries. For example, defining ModelPostConnection or ModelPostFilterInput is not possible:
```
type ModelPostConnection {
 items: [Post]!
 nextToken: String
}

type Query {
 listPosts(
   id: ID,
   filter: ModelPostFilterInput,
   limit: Int,
   nextToken: String,
   sortDirection: ModelSortDirection
 ): ModelPostConnection
}
```

Proposed Solution

Currently, defineData supports SDL for DynamoDB-backed schemas, as outlined in this GitHub pull request. However, attempting to pass in a SQL-based schema results in errors because defineData defaults to a DYNAMO_DATA_SOURCE_STRATEGY. The DataProps interface lacks the necessary API surface to override this default behavior.

Introducing support for SQL-based schemas and allowing developers to specify the data source strategy would greatly enhance the flexibility and usability of Amplify Gen 2 for a broader range of applications.

I'm aware that this will result in the loss of end-to-end typing. However, this is not much of an issue for customers who prefers to not use the data client, but instead use GraphQL clients (TanStack, Apollo, URQ etc) on the frontend, and a bespoke SQL Query Builder on the backend.

ykethan commented 3 months ago

Hey,👋 thanks for raising this! I'm going to transfer this over to our API repository for better assistance 🙂

kekami commented 3 months ago

Hey,👋 thanks for raising this! I'm going to transfer this over to our API repository for better assistance 🙂

@ykethan This pertains defineData which is in the backend repository. Associated PR: https://github.com/aws-amplify/amplify-backend/pull/1706

dpilch commented 3 months ago

I think this feature makes sense to add. We will need to discuss internally to make sure we align on the proposed API.

FWIW, we intended to reach feature parity with the data schema builder and GraphQL SDL at some point in the future.

kekami commented 3 months ago

Yes, I believe adding this feature would be very beneficial. Achieving feature parity with the data schema builder would be especially advantageous for customers using DynamoDB. Given DDB's simpler query capabilities, most database interactions can be effectively managed through AppSync.

However, for customers utilizing SQL databases, fetching data via GraphQL from within a Lambda function may not be as practical as using a SQL query builder that supports joins and transactions. The GraphQL spec is just not versatile enough.

I also see benefits in decoupling the two schemas as it allows for greater flexibility and more tailored API design. This of course comes at the cost of end-to-end typings, but that can be up to the customer to decide.

Thank you for considering this enhancement.

sundersc commented 2 months ago

@kekami - It is possible to use SDL with SQL datasource today. (DX is not great though, you will need to explicitly provide the VPC config and a bunch of other attributes even though you are not using them).

Create a schema file as shown below. Modify the SDL schema as required.

// data/sqlschema.ts
import { secret } from '@aws-amplify/backend';
import { DerivedModelSchema, DerivedApiDefinition } from '@aws-amplify/data-schema-types';

export const schema: DerivedModelSchema = {
  data: {
    types: {},
    configuration: {
      database: {
        engine: 'mysql',
        identifier: 'UNIQUE_ID',
        connectionUri: secret('CONN_STR'),
        vpcConfig: {         // <--- You can generate the VPC config using the `ampx generate schema-from-database ...` command. The generated schema file contains this information.
          vpcId: "vpc-a868e4d5",
          securityGroupIds: [
              "sg1"
          ],
          subnetAvailabilityZones: [
              {
                  "subnetId": "sb1",
                  "availabilityZone": "az1"
              },
              {
                  "subnetId": "sb2",
                  "availabilityZone": "az2"
              },
              {
                  "subnetId": "sb3",
                  "availabilityZone": "az3"
              },
          ],
        },
      },
    },
  },
  transform: (): DerivedApiDefinition => ({
    schema: `                                    
      type Person @model {
        id: String! @primaryKey
        firstName: String
        lastName: String
      }
    `,                       // <--- SDL Schema here
    functionSlots: [],
    jsFunctions: [],
    lambdaFunctions: {},
    functionSchemaAccess: [],
  }),
};

Then use the schema in the data/resource.ts file as shown below.

// data/resources.ts
import { defineData } from '@aws-amplify/backend';
import { schema } from './sqlschema';

export const data = defineData({
  schema,
  authorizationModes: {
    defaultAuthorizationMode: 'iam',
  },
});

This should allow you to define the schema in SDL for SQL databases. However, if you are using this option to define the API, then generating the client using const client = generateClient<Schema>(); won't work. Need to try an alternative to use the client.

aws-amplify / amplify-category-api