aws-amplify / amplify-cli

The AWS Amplify CLI is a toolchain for simplifying serverless web and mobile development.
Apache License 2.0
2.81k stars 820 forks source link

Ability to create more than 25 @models at once #2584

Closed trupa7 closed 2 years ago

trupa7 commented 4 years ago

As I added more table to my schema I am getting this error and some time I get error related to maximum number of tables that can be created simultaneously. It's deleted all my previously created tables and data in that tables.

Q1. can we Increase limits for dynamodb to create tables or indexes simultaneously?

Q2. is there any way we can prevent deletion of tables while got in to this error?

Q3 . Is it possible that it will create limited tables at time so it will not exceed limits

trupa7 commented 4 years ago

I have 43 tables which includes (m:n relationships tables)

jkeys-ecg-nmsu commented 4 years ago

@trupa7

  1. Looks like a DynamoDB soft limit that can be raised upon request, try opening a ticket with Support as demonstrated here.
  2. Not yet. Agreed, it would be nice if the CLI defaulted to Retain for each table provisioned under the API and Storage categories.
  3. I think the only way to do this in CloudFormation is to force resources to create in a certain order, either by using DependsOn or nested stacks. I think (3) would require a significant refactor of the transformer and the architecture of the generated root API stack.
jkeys-ecg-nmsu commented 4 years ago

@trupa7 btw if you hit up support right now (within 24 hours I believe) they will be able to restore your tables.

@yuth @kaustavghosh06 I know you all don't release roadmaps but it's hard to see how the API category will be useful for much more than prototyping if a bad push can drop all your production tables, and then potentially drop your AppSync API.

kaustavghosh06 commented 4 years ago

@trupa7 I'm not clear as to how all your previously created tables were deleted. Could you please explain more as to the steps to reproduce this behavior in deletion of tables with data?

trupa7 commented 4 years ago

@kaustavghosh06 as I added new schemas which have relationship with old tables . If creation of new tables will failed it deletes all previously created tables which have relationship with new tables. Reason of failing is same subscriber limit exceeded.

You have exceeded the maximum number of indexed tables that can be created simultaneously

trupa7 commented 4 years ago

@kaustavghosh06 so ,is that because of the relationships as I add more m:n relationship tables with new schemas it will add new secondary indexes to the related tables.

is this can be a cause??

API-Specific Limits CreateTable/UpdateTable/DeleteTable

The only exception is when you are creating a table with one or more secondary indexes. You can have up to 25 such requests running at a time.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#limits-api

kaustavghosh06 commented 4 years ago

@trupa7 Could you provide us with a before/after graphql annotated schema so that we can reproduce this behavior? Deleting of the tables isn't the intended behavior, but would like to reproduce and fix it.

jkeys-ecg-nmsu commented 4 years ago

@kaustavghosh06 If OP has a 43 model schema I doubt he'd be willing to share that publicly. Could you provide him an email to send the schemas?

@trupa7 maybe you're applications is extremely complex and needs that many models, but is there no way to denormalize your schema, remove nested models, etc. and overload models with indexes with the @key directive to support access patterns?

The only reason I can think of to have that many Dynamo tables is that you're doing it because currently the GQL Transformer-generated indexes project all attributes (#2577), and this is a way around that without having to fiddle with the Amp-gen stacks to not project all attributes on all indexes.

trupa7 commented 4 years ago

@kaustavghosh06 : I unable to share it because of our company's policy. I can try to make dummy schemas to reproduce this error.

trupa7 commented 4 years ago

@kaustavghosh06 If OP has a 43 model schema I doubt he'd be willing to share that publicly. Could you provide him an email to send the schemas?

@trupa7 maybe you're applications is extremely complex and needs that many models, but is there no way to denormalize your schema, remove nested models, etc. and overload models with indexes with the @key directive to support access patterns?

The only reason I can think of to have that many Dynamo tables is that you're doing it because currently the GQL Transformer-generated indexes project all attributes (#2577), and this is a way around that without having to fiddle with the Amp-gen stacks to not project all attributes on all indexes.

@jkeys-ecg-nmsu : I can try to denormalize some schema but as we will add more models in future I am afraid to get same errors. It's not just models but also we have tables which contains m:n relationships as amplify don't support m:n relationship directly.

ahsansmir commented 4 years ago

Hi Everyone,

Ahsan here from Rapticore. Thank you for your support, it is really appreciated. If it is possible lets get on a call and we will show you our schema, what we are doing and what errors we are seeing. If you still need the actual schema after that we can work something out. Again we appreciate your support. We are looking forward to making Amplify and AppSync core part of our application architecture.

Thanks,

Ahsan Mir

jkeys-ecg-nmsu commented 4 years ago

@ahsansmir I'm just trying to be a Good Samaritan by trying to help close issues, I am not an Amplify developer.

@kaustavghosh06 can you provide @ahsansmir @trupa7 with a private email to send the schema?

trupa7 commented 4 years ago

@kaustavghosh06 : I created dummy schema to reproduce this error. If you can use it and reproduce this error. Please guide us if we are doing something wrong or is this because it's hitting the limit.

https://github.com/trupa7/amplify-data/tree/master/schema

thanks

jkeys-ecg-nmsu commented 4 years ago

@trupa7 @ahsansmir Those repeating sequences within distinct field names (e.g. search for "zqd", "RFUd", "FFd") might be more revealing than you had intended. It looks to me like you posted a fun exercise in cryptography to see how much information can be reverse engineered from your obfuscated schema. Something to consider...

Actually it was more like a bioinformatics + crypto puzzle before you removed it, where you would try to iteratively find higher k-mers until you can't find any more, then collapse that into an alphabet, then try to assign meaning. Could I get a security consulting fee btw? :P

jkeys-ecg-nmsu commented 4 years ago

@trupa7 @ahsansmir @kaustavghosh06 that being said please send that obfuscated (or an actual dummy schema that replicates the behavior) to the Amplify team. For what it's worth I think you can omit almost all the fields since the error is related to creating tables concurrently (then deleting them), so that should help with intellectual property concerns.

kaustavghosh06 commented 4 years ago

@trupa7 Did you get a chance to talk to AWS support and increase the soft limit for DDB table creations as mentioned by @jkeys-ecg-nmsu?

jkeys-ecg-nmsu commented 4 years ago

@trupa7 try submitting a ticket to AWS Support to raise the soft limit and see if your issue re-occurs with your sample schema, please.

rrrix commented 4 years ago

Hi @kaustavghosh06 - I work on the same team as @ahsansmir and @trupa7 . We requested limit increase to the maximum allowed concurrent DynamoDB GSI limit - 100.

The limit was increased, and AWS Support said:

I've been working closely with our DynamoDB team on getting your request fulfilled and they already stated that it will be fulfilled in the next 2-3 days as long as you acknowledge that this may not fix your bottle neck and you could hit the new limit again.

They've also stated that we would not be able to further increase the limit after 100 in-flight-tables, as that's the maximum it can be increased.

"We can increase your limit to a degree, but note that you might not always be able to achieve the full number of concurrent control plane operations, depending on which combination of operations you are doing at once. If the table or index specifications are complex, such as a CreateTable requests with several Secondary Indexes, especially when each index has a "Projection" with many “NonKeyAttributes”, DynamoDB might temporarily reduce the number of concurrent operations.

With our schema, and with the new limit increase, we are still receiving the same error.

I attempted a workaround, where I modified the generated CloudFormation stack (amplify/backend/api/<apiname>/build/cloudformation-template.json) to sequentially create each nested stack, by adding each nested stack resource reference to the next stack's DependsOn attribute.

Here's a snippet from the CloudFormation stack showing the DependsOn changes.

{ "Organization": { "Type": "AWS::CloudFormation::Stack", "Properties": {}, "DependsOn": [ "GraphQLSchema", "NoneDataSource" ] }, "Application": { "Type": "AWS::CloudFormation::Stack", "Properties": {}, "DependsOn": [ "GraphQLSchema", "NoneDataSource", "Organization" ] }, "Group": { "Type": "AWS::CloudFormation::Stack", "Properties": {}, "DependsOn": [ "GraphQLSchema", "NoneDataSource", "Application" ] } }

I was able to successfully deploy the CloudFormation stack this way, however the Amplify CLI stopped working at this point - amplify status reports No Changes, and any attempt to amplify env pull or amplify env pull --restore or amplify push does not have any effect.

rrrix commented 4 years ago

Hi Amplify team,

I'm on the same team as @ahsansmir and @trupa7 (the OP of this issue)

Unfortunately, we have a new issue. Should I create a new Issue in GitHub for this?

The autogenerated JSON CloudFormation stack is too large - 811KB - and exceeds the allowed size in of 460KB.

CREATE_FAILED apiissue2584 AWS::CloudFormation::Stack Mon Nov 18 2019 09:42:09 GMT-0800 (Pacific Standard Time) Template may not exceed 460800 bytes in size.

I was attempting to add @auth directives to all of our types, so we can use a combination of Cognito, API Keys and IAM to access our AppSync API.

Just for fun, I tried cfn-flip to see what the YAML equivalent would be, and that is FAR smaller at 271KB, very much under the 460KB limit.

-rw-r--r-- 1 user staff 811K Nov 18 09:37 cloudformation-template.json -rw-r--r-- 1 user staff 271K Nov 18 09:44 cloudformation-template.yaml

I've re-obfuscated the schema that @trupa7 originally posted, and removed all the unnecessary String/Enum/AWSDate*/Boolean/etc. fields from all the types.

This verison (with @auth directives) Produces a CloudFormation template which is too large - cannot deploy: https://gist.github.com/rrrix/e7874ba46f9911604aec1eb1554ba07d

This version (without @auth) causes CloudFormation to deploy too many DynamoDB tables concurrently which results in a LimitExceededException and the entire API stack fails: https://gist.github.com/rrrix/09ade36345c988b6be581063aa49f219

I just reproduced it again this morning, even with the limit increase.

Subscriber limit exceeded: You have exceeded the maximum number of indexed tables that can be created simultaneously (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: LimitExceededException;

jkeys-ecg-nmsu commented 4 years ago

@trupa7 @kaustavghosh06 I don't think this is a priority for the Amplify team because that many tables is considered somewhat of an abuse of NoSQL. You should try to denormalize your schema and drop as many models as possible. For instance, for 1-to-1 connection field that is rarely updated or read-only, you could make it a simple AWSJSON blob instead of a model connection without much disruption. But if you need to frequently update that field, leave it as its own model. At least, that's my understanding of the suggested usage of the key and connection directives.

rrrix commented 4 years ago

Hi @jkeys-ecg-nmsu - thank you for your reply!

We are working within the limitations that the Amplify CLI enforces. Creating this many tables is an explicit design decision made by the Amplify team - each @model generates its own table and there is no configurable or programmable option to do otherwise.

If the Amplify CLI provided the option or ability to use a single table for our entire project, or even re-use tables across types, we would!

If the Amplify CLI made many-to-many models easier to implement without a lookup table (such as adjacency lists), we would!

Also of note, using join tables in this way is currently the officially documented and supported pattern within Amplify:

From the docs: https://aws-amplify.github.io/docs/cli-toolchain/graphql?sdk=js#connection

The @connection directive enables you to specify relationships between @model types. Currently, this supports one-to-one, one-to-many, and many-to-one relationships. You may implement many-to-many relationships using two one-to-many connections and a joining @model type. See the usage section for details.

(Emphasis mine)

From an Amplify Developer: https://github.com/aws-amplify/amplify-cli/blob/master/packages/graphql-connection-transformer/src/ModelConnectionTransformer.ts#L280

if (leftConnectionIsList && rightConnectionIsList) {
      // 1. TODO.
      // Use an intermediary table or other strategy like embedded string sets for many to many.
      throw new InvalidDirectiveError(`Invalid Connection (${connectionName}): Many to Many connections are not yet supported.`);

That being said, the so-called "NoSQL abuse" of having one table per @model is not a choice we willingly made, nor one we can change without abandoning Amplify altogether.

"Denormalizing" my schema and my data model is not a solution, we really actually do need all of these @models. The Amplify CLI should support valid real-world use cases from real-world organizations trying to solve real, complex problems, rather than having to design our data model around the limitations of the Amplify CLI.

Our schema is the codified representation of our application and business model, blindly assuming we can or should rip out any given data type just to avoid a concurrency bug (yes - this is a bug) is not sound advice.

jkeys-ecg-nmsu commented 4 years ago

@rrix The main thing I take away from your comment is that Amplify developers should focus on the GraphQL transformer as its standout feature. Which I fully agree with, and I empathize with your frustration.

I'm not sure that a service limitation implies a bug; but have you tried going back to support and telling them that the limit increase didn't solve the issue? Seems like it'd be easier to get AWS to recognize the problem it has with another limit increase request than shovel blame onto the Amplify team. (I have yet to hear of Support refusing to raise a soft limit to meet a legitimate business use case.)

On the other hand, I think the community is owed a roadmap so that we can plan our businesses around the future of Amplify. We don't even necessarily need ETAs, just priorities so we know what we should focus on implementing ourself. It's frustrating when Amplify releases a new feature that nukes some generic business logic a developer spent days or weeks on.

Our schema is the codified representation of our application and business model, blindly assuming we can or should rip out any given data type just to avoid a concurrency bug (yes - this is a bug) is not sound advice.

With AWSJSON blobs you can store lists and objects. If you wanted you could implement an adjacency list based graph yourself by writing your own resolvers. But you're here for the same reason we all are -- so we don't have to write our own resolvers. Therefore I think it's only fair to cut the devs some slack on current limitations, and assume they're not idiots.

You're also free to submit pull requests and add these features yourself. But, again, you're here for the same reason I am -- so you don't have to re-implement the same business logic 10^3 - 10^7 developers have implemented in parallel. Hurry up and wait seems like an appropriate course of action, and just deal with the limitations in the form of a slightly higher bill for the time being. You could also try explaining your business use case to Support and asking for credits to offset the limitations of the GraphQL transformer.

Last note: I imagine support for Aurora Serverless (the only RDS data source supported by AppSync AFAIK) is on the Amplify team's radar, but I don't know if there's an open issue on it. Maybe you should open an issue asking for AS support so that your relational data can be represented by systems that were designed for that decades ago.

rrrix commented 4 years ago

Hey @jkeys-ecg-nmsu - thanks for your response. It's good to know others feel the same pain :)

Seems like it'd be easier to get AWS to recognize the problem it has with another limit increase request

AWS Support put it very bluntly - they've raised the limit to the hard maximum limit already and cannot raise the limit any further:

They've also stated that we would not be able to further increase the limit after 100 in-flight-tables, as that's the maximum it can be increased.

On the other hand, I think the community is owed a roadmap so that we can plan our businesses around the future of Amplify. We don't even necessarily need ETAs, just priorities so we know what we should focus on implementing ourself. It's frustrating when Amplify releases a new feature that nukes some generic business logic a developer spent days or weeks on.

OMG dude... preaching to the choir!! Amplify is "open source" on paper, but is not built and managed as a true open source project usually is. AWS has a strong incentive to keep new features basically secret until they release them (-cough- Amplify DataStore / Amplify iOS / Amplify Android). There is no public roadmap. We don't even know know what the Amplify team's short, medium or long term priorities are. Features? Performance? Bug fixes? Security? Scalability?

Yes, I know CONTRIBUTING.md says that Pull Requests are welcome, but the vast majority of MERGED pull requests that I can see are from what appears to be paid AWS employees. I would love to submit a few pull requests, but I honestly can't figure out how to reliably engage the owners of this project for a meaningful discussion. It's said to open an issue for discussion, but there are just too many open issues created every single day to reliably get any attention beyond basic troubleshooting. And they are not on gitter.im. I tried.

open an issue asking for AS support so that your relational data can be represented by systems that were designed for that decades ago.

Fun fact: DynamoDB is actually designed for relational data too, and has an excellent guide on designing your database to do exactly that: Best Practices for DynamoDB » Best Practices for Modeling Relational Data in DynamoDB

That said - the way Amplify does things is way, way out of line with the DynamoDB best practices documentation. One of the most important (in my opinion, at least):

You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table.

This is basically impossible to do when using the @model directive in Amplify.

⬆️ From Best Practices for DynamoDB » NoSQL Design for DynamoDB: Two Key Concepts for NoSQL Design

Anyways, I've decided to take things into my own hands. I've forked the CLI and fixed all of the bugs I've had (and there's been MANY more) - including this one, plus some pretty cool features I've wanted for a while. I've gotten very familiar with the CLI codebase, and have a bunch of more things I want to build and fix, particularly relating to developer usability. Feel free to check it out - https://github.com/rapticore/amplify-cli

@jkeys-ecg-nmsu if there's anything you've wanted for a while, let me know, I'll see what I can do to help you since you've been very helpful to me and the rest of the community.

jkeys-ecg-nmsu commented 4 years ago

@rrix You can represent relational data with NoSQL, but I wouldn't say NoSQL (Dynamo) was designed for relational data. It's closer to a key-value pair than it is a relational database. From the next page in the docs you cited:

A common approach to DynamoDB schema design is to identify application layer entities and use denormalization and composite key aggregation to reduce query complexity.

In DynamoDB, this means using composite sort keys, overloaded global secondary indexes, partitioned tables/indexes, and other design patterns. You can use these elements to structure the data so that an application can retrieve whatever it needs for a given access pattern using a single query on a table or index.

"and other design patterns" :D Another way of putting that: building a flexible schema is left as an exercise to the reader.

That being said, overloading GSIs is currently possible with Amplify.

This is basically impossible to do when using the @model directive in Amplify.

I'm not sure I agree with this. You've said that every model needs to be a model, and it's hard to argue with that because only you know your access pattern.s However, with the @key directive you can create a single table with as many primary key indexes partitioned by type as you want, as well as an arbitrary combination of sort key patterns to meet your access patterns. Take the first two models of your schema:

normalized.graphql

type Automotive @auth(rules: [
    {allow: public, provider: apiKey},
    {allow: public, provider: iam},
    {allow: private, provider: userPools}
]) @model {
    id: ID!
    name: String
    components: [Component] @connection(name: "AutomotiveComponents")
}

type Baby @auth(rules: [
    {allow: public, provider: apiKey},
    {allow: public, provider: iam},
    {allow: private, provider: userPools}
]) @model {
    id: ID!
    name: String
    component: Component @connection(name: "ComponentBabys")
    bookss: [BabyBookss] @connection(name: "BabyBookss")
    widgets: [Games] @connection(name: "BabyWidgets")
    gardens: [BabyGardens] @connection(name: "BabyGardens")
    grocerys: [BabyGrocerys] @connection(name: "BabyGrocerys")
    healths: [BabyHealths] @connection(name: "BabyHealths")
    homes: [BabyHomes] @connection(name: "BabyHomes")
    toolss: [BabyTools] @connection(name: "BabyToolss")
    shippings: [BabyShippings] @connection(name: "BabyShippings")
}

Okay, so you have two tables here. Luckily, they don't reference each other.* Therefore couldn't we combine these two models while retaining equivalent access patterns?

overloaded-1.graphql

enum EntityType {
 Automotive
 Baby
}

type Entity @auth(rules: [
    {allow: public, provider: apiKey},
    {allow: public, pvidroer: iam},
    {allow: private, provider: userPools}
]) @model
  @key(fields: [id, type], name: 'EntityTypeIndex', queryField: 'entityByType')
 {
    id: ID!
    type: Entity
    name: String
    components: [Component] @connection(name: "AutomotiveComponents")
    component: Component @connection(name: "ComponentBabys")
    bookss: [BabyBookss] @connection(name: "BabyBookss")
    widgets: [Games] @connection(name: "BabyWidgets")
    gardens: [BabyGardens] @connection(name: "BabyGardens")
    grocerys: [BabyGrocerys] @connection(name: "BabyGrocerys")
    healths: [BabyHealths] @connection(name: "BabyHealths")
    homes: [BabyHomes] @connection(name: "BabyHomes")
    toolss: [BabyTools] @connection(name: "BabyToolss")
    shippings: [BabyShippings] @connection(name: "BabyShippings")
}

To me this looks promising. Let's extend it to a third entity type:

overloaded-2.graphql

enum EntityType {
 Automotive
 Baby
 Clothing
}

type Entity @auth(rules: [
    {allow: public, provider: apiKey},
    {allow: public, provider: iam},
    {allow: private, provider: userPools}
]) @model
  @key(fields: [id, type], name: 'EntityTypeIndex', queryField: 'entityByType')  
 {
    id: ID!
    type: Entity!
    name: String
    autoComponents: [Component] @connection(name: "AutomotiveComponents")
    componentBabies: Component @connection(name: "ComponentBabys")
    babyBooks: [BabyBookss] @connection(name: "BabyBookss")
    babyWidgets: [Games] @connection(name: "BabyWidgets")
    babyGardens: [BabyGardens] @connection(name: "BabyGardens")
    babyGrocerys: [BabyGrocerys] @connection(name: "BabyGrocerys")
    babyHealths: [BabyHealths] @connection(name: "BabyHealths")
    babyHomes: [BabyHomes] @connection(name: "BabyHomes")
    babyToolss: [BabyTools] @connection(name: "BabyToolss")
    babyShippings: [BabyShippings] @connection(name: "BabyShippings")
    bookClothings: Books @connection(name: "BooksClothinges")
    clothingComputers: Computers @connection(name: "ClothingComputerss")
}

Since all the connection fields are nullable, you can override this generic entity to your heart's content. You'd probably have to devise a dynamic programming algorithm to reduce your schema to the optimal set of generic models. (Sounds like a fun exercise actually, I'll maybe this one to a former prof.)

*The GQL transformer also currently supports recursive models (or at least generates queries as if it does), so theoretically there's no reason you couldn't have 1-1 or 1-many Entity connections. Then you'd just need join tables for each of your many-many Entity connections.

Also more complex query patterns will probably force you into writing your own resolvers.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html

OMG dude... preaching to the choir!! Amplify is "open source" on paper, but is not built and managed as a true open source project usually is. AWS has a strong incentive to keep new features basically secret until they release them (-cough- Amplify DataStore / Amplify iOS / Amplify Android). There is no public roadmap. We don't even know know what the Amplify team's short, medium or long term priorities are. Features? Performance? Bug fixes? Security? Scalability?

Hopefully this will change as Amplify becomes developed more by the community and less by AWS as an "internal" product.

@jkeys-ecg-nmsu if there's anything you've wanted for a while, let me know, I'll see what I can do to help you since you've been very helpful to me and the rest of the community.

I'll have to get back to you. I'm actually working within my company on developing an open source policy so that we can start contributing and submitting pull requests. I have more asks beyond the open issues I currently have but I don't think they can be shared publicly due to NDA.

Thank you very much, though. I've opened a lot of issues in the past so I try to pass it on by being helpful where I can. Hopefully the Amplify community will come to resemble those of more traditional FOSS projects. (I've toyed with the idea of porting closed Github issues here to StackOverflow community/wiki questions to minimize the number of new issues that crop up, but it's an enormous undertaking.)

Edit: Also, I hope you plan on merging your codebase into Amplify's if and when they adopt a more developer-centric approach to roadmapping and project focus. I'll check out your fork and let you know what I think.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

iShavgula-TacTill commented 4 years ago

I'm having the exact same issue with Amplify 4.19. Is there any update on this ?

SourceCode commented 3 years ago

We are refactoring to be closer in line with single table approaches. This is a real problem if you are deploying new environments from a code base that has progressed and has may relations.

The solutions we started with to resolve this were:

We chose a refactor of the Dynamo storage after reviewing the following from AWS:

Revisit the following: https://docs.amplify.aws/cli/graphql-transformer/key and consider two things:

  1. Keys defined without a queryField are going to create a PK
  2. Keys defined with one are creating a secondary index on your table.

Between these you can strategize complex storage scenarios and minimize table count.

With each table also comes increased build times, so something else to consider.

What I would like to see instead of this improvement would be the documentation and access to DynamoDB go deeper on single table approaches with the graphql markup and types.

josefaidt commented 2 years ago

Hey @trupa7 and folks in the thread :wave: this issue has since been fixed and we are now able to deploy more than 25 models at once! (reproduced with Amplify CLI 7.6.26, GraphQL Transformer v2, and ~35 models)

Closing issue 🙂 if you are still experiencing this issue please reply back to this thread and we can re-open to investigate further