Enhance Amplify Push Workflow

plaa commented 5 years ago

Which Category is your question related to? Amplify cli

What AWS Services are you utilizing? AppSync, Lambda, Auth

Provide additional details e.g. code snippets During a few weeks of development I've encountered many occasions when changes I've made fail to be updated using amplify push. I don't seem to be alone.

Thus far I've resolved these by deleting and recreating the API+DB, but in production that would not be an option. I have no idea how I could perform the updates if we already had a production system. We're now seriously considering can we proceed to production using Amplify or do we need to switch to other solutions.

Cases I've faced include for example:

Amplify attempting to modify multiple GSI's in one operation (this was resolved by commenting out part of the changes and performing the push in two parts)
When I created a custom resolver for an existing field, the push failed complaining that a resolver already exists for the field. It seemed that Amplify was creating the new custom resolver before removing the old one.
Some kind of collision in CF export naming - this one baffled me because I had done nothing related to ApplicationArea in the latest changes: Export with name bjlounrbanderkj4wu4gnr63my:GetAtt:ApplicationAreaDataSource:Name is already exported by stack xyzzy-20190502163009-apixyzzy-OEWIC881QYZP-ApplicationArea-1T2QUAK2OVDYG

and several others.

One of the issues seems to be that the API+DB are coupled. It wouldn't be a problem for us to delete and recreate the API (with a short outage), but the DB is lost in the same operation.

Every time I've tried to make changes directly in CF (for example deleting some stack), the local Amplify state has become out-of-sync with the cloud and the only way I've been able to resolve it has been to completely delete Amplify and start amplify init from scratch.

Can you provide any general guidance how these kinds of situations could be resolved? For example is there some way to have Amplify delete + create changed stacks instead of updating them? Or is it safe to delete or modify some stack manually in order to resolve such conflicts?

I'm not looking for specific instructions for a particular error case, but more general guidelines of a) what kind of things can be manually altered in order to make an update go through, and b) what can be tried to resolve cases where the Amplify local and cloud states have become out-of-sync.

hew commented 5 years ago

It's a huge concern. We're into the Amplify stack but I'm currently thinking if there is a way we can extricate ourselves out and possibly just use AppSync and bypass Amplify - except for the codegen feature. There's certain errors that basically seem to be completely irrecoverable from, which as you state, in production is unacceptable.

One of the issues seems to be that the API+DB are coupled. It wouldn't be a problem for us to delete and recreate the API (with a short outage), but the DB is lost in the same operation.

This is key.

plaa commented 5 years ago

We've also started to look for alternatives. We're still hoping to use Amplify for auth, hosting, functions and REST/API-GW, but remove the GraphQL part and implement that ourselves using API-GW + Lambda + RDS. We're hoping that would remove the majority of error cases.

sacrampton commented 5 years ago

This is really critical. The amplify push seems very fragile. A couple that I'm running into are:

Can't update/delete more than 1 GSI - so if you add or change more than 1 connection in subsequent pushes it will fail. I currently have 180 connections across 31 types. Pushing this can take over a hour - it is not feasible to be limited to 1 change at a time to the connections.
"The total request size of all concurrent control plane operations has been exceeded" when doing a push on this many types & connections.

The schema.graphql is valid, but all these errors seem to occur because its getting pushed to DynamoDB concurrently. It seems to me that if there was a way for Amplify to do this sequentially a lot of these push fails could disappear.

MY QUESTION @mikeparisstuff - Is there a way in the amplify cli to instruct CloudFormation to perform the create/update/delete of DynamoDB resources sequentially. To create a table then wait for that to complete, then create a GSI and wait for that to complete, etc. etc. Seems that most of the issues I'm seeing with the push are when everything is thrown at DynamoDB at once and limits are being exceeded.

davekiss commented 5 years ago

Echoing the sentiment above, this is definitely one of the most concerning parts of Amplify. The DX for provisioning resources is great when it works, but once one of the dreaded CFN error messages appear, good luck. Some of the messages make you think you might be able to resolve it within the AWS console at which point your settings drift apart and the least frustrating/least work option is to blow it all up and provision from scratch. I'm also really not sure how else I would handle this if I were in a production environment at this time.

Amplify could offer the coolest transform directives and best codegen in the world, but if it can't be deployed without hitting these painpoints, it's moot.

Even some way to revert the local schema to the format which CFN would accept during a deploy would be better than leaving things in an unusable middle state.

sacrampton commented 5 years ago

A small step forward in this - my theory above was that CloudFormation was pushing everything to DynamoDB at once and that if you could control that then this might be more reliable.

I compiled my graphql (amplify api gql-compile) then edited the cloudformation-template.json in the build directory. Under the Resources section of this json you have GraphQLAPI, GraphQLAPIKey, GraphQLSchema then you have all your tables, then you have ConnectionStack, SearchableStack, CustomResources. Each, except the first 3 GraphQL resources, has a DependsOn variable set. For tables it is set to GraphQLSchema and for the last 3 its set to all GraphQL resources and all tables.

I've been experimenting (painful since each test takes well over an hour), but I'm getting further by telling each table that it also DependsOn the previous table before going ahead in the cloudformation-template.json - then using amplify push --no-gql-override so that it doesn't compile again and overwrite the cloudformation-template.json I just modified.

I think this will work for tables, but all connections and therefore GSI's, are defined in the ConnectionStack and it doesn't appear any way to do them one by one and there is a limitation that you can only update/delete 1 GSI per push.

Echoing Dave's sentiments above - the CloudFormation settings being auto-generated by amplify are undeployable. These sort of things should be the defaults as being able to reliably deploy to dev, test, prod is what its all about - if you can't reliably do that then it kind of defeats the entire purpose.

hew commented 5 years ago

So far, using environments, we're been able to achieve a level of stability, where we do the tricky stuff on a branch, and when everything is stable, "merge" it back in with prod. I have still gotten the updateStackNotComplete error, but env pull --restore seems to work when this happens. Have to just back up and try the schema change again w/ less stuff happening.

Another trick I'll throw out is to actually package up amplify where it in its own separate directory, and then publish each push as a private NPM package to be consumed by other team members. That way, you have one source of truth where other team members don't need to worry about pulls and sync and whatnot. They just install and start making queries/mutations/etc. This works fine so long as they have pre-existing ~/.aws/credentials for Amplify to look up.

ajhool commented 5 years ago

There are currently 3 (!) RFC's for enhanced Amplify API features (the @auth directive #1043 and custom indexes #1062 and local testing #1433 ).

However, the current state of amplify api push feels like an alpha or beta version because it breaks so frequently and the error messages are nonexistant (eg. #922). No matter how many nice auth or index or local testing features that amplify adds, they will never be useful if the core functionality is deeply unreliable. I was drawn into the Amplify framework by all of these impressive features, but have found the core to be deeply unreliable for a production use case.

Instead of adding new functionality, would the Amplify team please focus on making the core service reliable so that it can be used confidently in a production environment?

plaa commented 5 years ago

We have now replaced the Amplify AppSync GraphQL autogeneration with a Postgres RDS and Postgraphile running in a Lambda function. The VPC + RDS is setup using manually-installed cloudonaut.io CF stacks (waiting on #1426 to get them deployed using Amplify cli), and the rest is using Amplify-controlled API Gateway + Lambda functions.

During a week of development I haven't had a single failure in Amplify pushing. It seems that the vast majority of problems were related to the AppSync/GraphQL autogeneration, and the rest of Amplify fulfills its promise. A major issue (though not the only one) was the DB and API being coupled in the same resource, meaning you couldn't remove and redeploy the API without losing all your data.

It's not to say that the setup is completely without issues. We have CF stacks exporting values which are used by other stacks, in which case the base stacks often cannot be updated without removing the other stacks first. But now this is manageable and in our control, and not just hoping the Amplify black magic works. I still wouldn't use Amplify in scenarios requiring zero-downtime deployments, but in non-critical areas it seems to work fine.

mikeparisstuff commented 5 years ago

As @hew mentioned the amplify env pull --restore command is a great way to get out of tough situations with complex changes. We are looking at this with great care and can do a few more things to help with this:

As of writing, almost all of the recent deployment failures seem to fall into three categories:

A change to @connection causes CFN updates to fail because of DynamoDB GSI update limits.
Removing something causes a nested stack to fail with an error about a missing export. The most common case is removing @searchable.
A failed update (likely caused by 1 or 2) puts CFN in a situation where a stack successfully removed a resolver but due to some CFN implementation issue did not re-create the resolver on rollback. This causes subsequent deployments to fail with a "Resolver not found" error because a resolver does not exist when CFN thinks it should.

The first point is being solved by the @key directive which will remove the black box around key structures and will give you more control. The @connection directive will then go through some sort of deprecation and re-introduction process because the CloudFormation limits on DynamoDB GSIs make the current implementation difficult to use in practice. A similar concept to @connection that leverages @key will be introduced that does not suffer the same issues as @connection.

The second point has been solved by a recent PR that always outputs relevant exports that are used by downstream stacks. Thanks @kstro21

The third point seems to be an issue with the AppSync CFN implementation and is being looked into, but removing failure cases such as the @connection issues will make this much less likely to occur. You can get around this issue today by adding back resolvers to the fields that are expected to have resolvers and then re-running amplify push.

In addition to the above changes we can do the following:

We will update the docs with more detailed information about what situations you might see and how to fix them. The categories mentioned above are a good place to start.
We will introduce a sanity check that is able to compare your current cloud backend with the new API that will prevent known breaking changes and surface helpful error messages to guide you through them. This will be able to prevent things like breaking @connection changes.
We will document best-practices for evolving Amplify APIs. Data model design with DynamoDB is different than data model design with SQL and we will help guide this process better.

ajhool commented 5 years ago

That is all great news. Would there be an easy way to decouple the API from the Databases, as @plaa suggests? Destroying the API and rebuilding wouldn't be much of an issue if the DB tables weren't deleted, too. This would also, theoretically, make it easy to add a canary (blue / green) API deployment that points to the same backend data (although there are some challenges there, too)

You would not be able to destroy databases and keep the API, but I believe you could destroy the API while maintaining databases and use existing databases as datasources when redeploying the API. Of course, some @key changes would necessitate the creation of new tables

mikeparisstuff commented 5 years ago

@ajhool This is an interesting idea and is something we can definitely look into. One obvious way to allow this is to enable referencing external tables from within the API category similarly to how we do with the @function directive. We have discussed this many time internally and perhaps it is time.

If we went down this route, you would be able to use amplify add storage to create DynamoDB tables that live externally to the API stack. We could then introduce a new directive (or argument to an existing directive) that allows you to specify that a @model should target an external table. E.G.

type User @model(table: "MyTable-${env}") { ... }

# Or alternatively with a new directive that specifies the model should use an existing table.
type User @model @table(name: "MyTable-${env}") { ... }

The CLI can then offer options to automatically import an existing table (deployed through Amplify or not) into an API project. This would allow you to change the API at will without worrying about data integrity.

I'll also mention that the goal is to eventually empower the community to build their own reproducible patterns. We are working through PR #1396 that will allow you to write your own transformers to encapsulate those behaviors. The process is reasonably simple and hopefully will unblock all the good ideas you guys have. For example writing a transformer that tells a @model to use an existing table is as simple as removing the AWS::DynamoDB::Table record and replacing an ARN in the generated AppSync data source & IAM role. This will require docs but the goal is to allow you to write transformers like this (https://github.com/aws-amplify/amplify-cli/blob/master/packages/graphql-function-transformer/src/FunctionTransformer.ts) for your own custom workflows.

ajhool commented 5 years ago

Looks like a workable concept to me. Amplify's autocreation of resolvers is fantastic, so as long as Amplify can recognize that @table is a DynamoDB table and autocreate resolvers effectively, then it would be very nice to have that option.

Monder87 commented 5 years ago

Hi there! I am adding myself in the list of users having the amplify push error:

✖ An error occurred when pushing the resources to the cloud

Resource is not in the state stackUpdateComplete

I have frankly say that unfortunately this is becoming a really nasty issue for my company, we are going in production very soon and no matter what i do at the moment i cannot fix it, i have to do the api again, and this really scary me in a prod scenario, i prob will opt to follow a workflow like @hew suggested.

We start nice and smooth but when the schema become more complex things become hard to solve. this is the portion of my schema that i wanted to expand:

In here works fine :

type Recipe @model {
  id: ID!
  user: User @connection(name: "UserRecipe")
  name: String!
  countries: [RecipeCountry] @connection(name: "RecipeCountry")
  leads: [Lead] @connection(name: "RecipeLeads")
  industries: [RecipeIndustry] @connection(name: "RecipeIndustry")
  customIndustries: [CustomIndustry] @connection(name: "RecipeCustomIndustry")
  createdAt: String
  updatedAt: String
}

type RecipeIndustry @model {
  id: ID!
  recipe: Recipe @connection(name: "RecipeIndustry")
  industry: Industry @connection(name: "IndustryRecipe")
  createdAt: String
  updatedAt: String
}

type Industry @model {
  id: ID!
  name: String
  code: String
  recipes: [RecipeIndustry] @connection(name: "IndustryRecipe")
}

type CustomIndustry @model {
  name: String
  recipe: Recipe @connection(name: "RecipeCustomIndustry")
  createdAt: String
  updatedAt: String
}

type Lead @model {
  id: ID!
  leadsType: String!
  maxQuota: Int
  generatedQuotas: [Quota] @connection(name: "LeadsQuota")
  recipe: Recipe @connection(name: "RecipeLeads")
  createdAt: String
  updatedAt: String
}

type RecipeCountry @model {
  id: ID!
  recipe: Recipe @connection(name: "RecipeCountry")
  country: Country @connection(name: "CountryRecipe")
  createdAt: String
  updatedAt: String
}

type Country @model {
  id: ID!
  name: String
  code: String
  user: [User] @connection(name: "UserCountry")
  company: [Company] @connection(name: "CompanyCountry")
  recipes: [RecipeCountry] @connection(name: "CountryRecipe")
}

type Company @model {
  name: String
  industry: String
  number_employee: Int
  website: String
  phoneNumber: Int
  email: String
  address: String
  user: [User] @connection(name: "UserCompany")
  country: Country @connection(name: "CompanyCountry")
}

when i tried to modify the Recipe model and add all in once:

...
departments: [RecipeDepartment] @connection(name: "RecipeDepartment")
seniorities: [RecipeSeniority] @connection(name: "RecipeSeniority")
customJob: [CustomJob] @connection(name: "RecipeCustomJob")
...

with the relative models :

type RecipeSeniority @model {
  id: ID!
  recipe: Recipe @connection(name: "RecipeSeniority")
  seniority: Seniority @connection(name: "SeniorityRecipe")
  createdAt: String
  updatedAt: String
}

type Seniority @model {
  id: ID!
  name: String
  code: String
  recipes: [RecipeSeniority] @connection(name: "SeniorityRecipe")
}

type RecipeDepartment @model {
  id: ID!
  recipe: Recipe @connection(name: "RecipeDepartment")
  department: Department @connection(name: "DepartmentRecipe")
  createdAt: String
  updatedAt: String
}

type Department @model {
  id: ID!
  name: String
  code: String
  recipes: [RecipeDepartment] @connection(name: "DepartmentRecipe")
}

type CustomJob @model {
  id: ID!
  name: String
  recipe: Recipe @connection(name: "RecipeCustomJob")
  createdAt: String
  updatedAt: String
}

i got a nice

Resource is not in the state stackUpdateComplete

so I try to follow the @mikeparisstuff suggestions , delete all @connection, push , and after add again one by one the new @connection... but failed again same error at the first @connection.

i even deleted all new models and @connection related to old ones, but i got always same error, for instance if now i just add one single one to many relationship in the Recipe model:

...
  customJob: [CustomJob] @connection(name: "RecipeCustomJob")
...

and the relative model:

type CustomJob @model {
  id: ID!
  name: String
  recipe: Recipe @connection(name: "RecipeCustomJob")
  createdAt: String
  updatedAt: String
}

i got error, i even try change the names of models and connections, so instead "custom jobs i used customProfessions", but no luck. maybe i achieve what @mikeparisstuff says in point 3

A failed update (likely caused by 1 or 2) puts CFN in a situation where a stack successfully removed a resolver but due to some CFN implementation issue did not re-create the resolver on rollback. This causes subsequent deployments to fail with a "Resolver not found" error because a resolver does not exist when CFN thinks it should.

I am stuck, cannot add and connect single new model with old model withouth a push failureoutcome, only solution now is erase the database and start over again, i will lose some data and that make all this even more annoying, please try fix this issue asap cause people like me are quite stuck and we got project to run in short time. Thanks

hew commented 5 years ago

Ok say you get one of these errors you cannot recover from, and you are on prod env:

amplify env pull --restore amplify env add tempenv amplify env checkout tempenv amplify push amplify env remove prod (remove cloud resources) amplify env add prod amplify env checkout prod amplify push amplify env remove tempenv (remove cloud resources)

NOTE: if there is a way to rename an env, that would eliminate some of the steps above.

It's not pretty, but it's probably faster than trying to debug, make a push, wait, debug, etc. It will sort of depend how many other services and permissions you need to set up on the new env.

Monder87 commented 5 years ago

Thanks @hew, i think is a workoround very usefull for who like me is stuck and need go in prod. Being a very sensitive issue i hope the team will try fix it asap

sacrampton commented 5 years ago

My team has been working on the challenges we've been facing around getting this working - in particular being able to deploy multiple GSI's. I'll put this out there and would welcome feedback on the validity of this approach.

Essentially they are saying to abandon amplify for serverless (https://serverless.com/). That serverless is able to do all of this and as proof of that they deployed 180 GSI's across 30 tables (something we were flat out unable to get CloudFormation to do by any method).

There are a couple of references they provide for serverless graphql https://github.com/serverless/serverless-graphql https://hackernoon.com/running-a-scalable-reliable-graphql-endpoint-with-serverless-24c3bb5acb43

My team are very keen to move forward with Serverless and put amplify behind us for now. Any thoughts on the validity of this approach?

hew commented 5 years ago

@Monder87

Just keep in mind that you will lose all the settings you have for permissions, lambda configs, etc. If you have a project of any decent size, it's still going to take a while to get everything working again. I entertained that flow yesterday and I'm still fixing different things.

I think this weekend I'm going to explore Serverless, or literally anything else, as @sacrampton suggested. I honestly cannot take the pain of this anymore.

hew commented 5 years ago

As an aside, has anyone ever had AppSync simply stop updating, but Amplify pushes succeed?

sacrampton commented 5 years ago

Still testing the best approach here, but my team has successfully used amplify to generate compiled AppSync schemas (ie. amplify api gql-compile - schema/resolvers) then use serverless to deploy that stack. As I said, a work in progress, but it seems this might be a valid workflow - essentially using serverless to replace the amplify push / cloudfront usage.

For what its worth, separate to amplify push, we were unable to get standalone CloudFront to deploy a stack that had multiple GSI's on a table. The DependsOn attribute will pause for resources, but GSI's are contained within resources so there is no way to pause it. We invested weeks trying to get this to work, but in the end serverless got it working straight up. And given Amplify Push sends all of this to CloudFront there is always going to be a problem unless there is a way for CloudFront to deploy GSI's successfully.

At this time its looking like a hybrid of amplify transform to generate the schema - the serverless to replace push. This of course may change as we get further into it.

undefobj commented 5 years ago

Hello everyone - We're currently working on this issue. For some clarity while you may be seeing this issue in the Amplify CLI, we believe the majority of the issues in this thread are related to a CloudFormation issue in AppSync and not how Amplify is doing a deployment. It occurs when there is a race condition between removing types and adding/modifying others.

We're currently working with the AppSync team on resolving this and for clarity, you could still see this issue if you were using an alternative method of deployment such as Serverless or hand rolled CloudFormation deployment. In the meantime if you see an error saying "No resolver found" then you can workaround the problem by attaching an empty resolver to that type in the AppSync client and then attempting a push again.

sacrampton commented 5 years ago

Hi Richard (@undefobj ) - thanks for taking the time to respond on here, but I'd like to politely point out that a comment like "We're currently working on the issue" doesn't really help anyone. Everyone on here is trying to solve issues related to reliably deploying their solutions in dev/test/prod. We have timelines we have to meet on this - in my case, deploying our AppSync app into test/prod so I can start getting paid for it. This is not something that I can do whenever a solution appears - it is something that has to be done and if I can't do it this way then we have to find another way.

At the moment if you are using GSI's then Aplify/Cloudfront is unusable and there is no work around in the interim that I'm aware of (serverless is looking promising however). There are also valid concerns raised by others in the wisdom of not separating the app and underlying databases (ie. stories of how easy it is to have cloudfront blow away entire databases).

I fully realize that you can't promise specific dates, but you can provide non-committal guidance (ie. we are working on an this issue and anticipate having a fix in place next week / next month / next year). That guidance would help us decided whether to hold tight and see what might be coming, or abandon and go another route.

Giving approximate time line guidance would be helpful - and given this is a "dead in the water" issue, interim workarounds are really critical.

undefobj commented 5 years ago

@sacrampton we needed a bit more time to investigate before giving any timelines. My response was to give you clarity on the situation from a technical standpoint so that you understood you can have this problem with any CloudFormation technology. It's independent of the Amplify CLI.

That being said we were able to dive deeper into the issue today and identify the root cause with the CloudFormation update process and are working on an AppSync deployment to resolve this. ETA is currently end of this week but if we can get it sooner I will reply back.

undefobj commented 5 years ago

All - The fix to AppSync CloudFormation for the "No resolver error" has now been deployed to all regions. If this was the root cause of your error then you should be able to run amplify env pull --restore followed by amplify push to resolve the conflict.

hew commented 5 years ago

@undefobj Big thanks to the team(s) for this fix!

sacrampton commented 5 years ago

The @key directive does not work for multiple GSI update/creations. Is there any way anyone can think of to get this to work?

sacrampton commented 5 years ago

We've had a little success with Amplify that I thought I'd share in case its helpful. I have a database with 32 tables/models and 85 global secondary indexes (GSI's) spread across those tables - with a maximum of 9 GSI's on a single table.

First thing is as @hew suggested - do all of it on a separate environment/branch first to make sure it works before you try to push it into a production environment.

So lets consider I have 3 models as shown below that have an @connection as shown. This arrangement will try to create TWO (2) GSI's on the Asset table in DynamoDB that will allow me to query asset to see all assets in a particular Plant or all assets in a particular Package - but if I try to push it all at once it will fail even though it compiles successfully. So the trick is to comment out the second GSI (ie. package) - do the push, then uncomment and push again - doing it 1 GSI at a time - and that works fine.

type Plant @model { id: ID name: String
asset: [Asset] @connection (name: "PlantAsset", sortField: "name") }

type Package @model { id: ID name: String
asset: [Asset] @connection (name: "PackageAsset", sortField: "name") }

type Asset @model { id: ID name: String plant: Plant @connection (name: "PlantAsset", sortField: "name") package: Package @connection (name: "PackageAsset", sortField: "name") }

Now with 85 GSI's I was worried about how many pushes I'd have to do - but the largest number of GSI's I have is 9 on a single table, so its 9 pushes to get my 85 GSI's created. The way I manage this is to also save my schema.graphql files as a full audit trail (ie. schema-1.graphql, schema-2.graphql, schema-3.graphql, etc.) - this allows me to then push the build to any other environment (ie. dev, test, etc.) - I just have to repeat the sequence.

Not quite there yet - couple of other things I learnt. If I try to do the first push it will fail because it will have too much going on and overload. So I need to edit the "cloudformation-template.json" file that is auto-generated. It has a "DependsOn" attribute - instead of the default where it just waits for GraphQLSchema to complete, I have it wait for all preceding models to complete. So on the Asset model it might look like this where I tell it to wait for Plant & Package.

        "DependsOn": [
            "GraphQLSchema",
            "Plant",
            "Package"
        ]

I generally haven't had to do this after the first push when its creating all tables and first GSI - but if you run into a random fail then doing this in my experience has fixed it.

On every push I use "amplify api gql-compile" to compile my schema.graphql - then if necessary I edit the cloudformation-template.json as described above. But then there are 2 other things I had to check for every time:

Cloudformation templates can't be greater than 460KB and amplify produces templates that can be much larger than this. So you should check that every file in the build/stack folder is less than 460KB. In my case the "ConnectionStack.json" was always well over 460KB so I used https://codebeautify.org/jsonminifier to minify/compress this json file and it would generally compress by more than 2/3. It would be nice if amplify did this automatically if it exceeds 460KB.
Cloudformation templates can't have more than 200 resources - I don't have any work around for this. My first iteration had a few more @connections and resulted in 240 resources being generated in the "ConnectionStack.json". I suspect the work around it to create a further nesting of the ConnectionStack.json, but I couldn't get it to work. Once again, if Amplify is producing this it would be good if it realized it has placed more than 200 resources in the stack and split it up for you.

Once you have checked all that then use "amplify push --no-gql-override --yes" to deploy (this override is important so that amplify don't try to re-compile and overwrite all your edits.

So for me - this has been working reliably

build an audit trail of schema.graphql files that add a single GSI at a time
put DependsOn attributes on cloudformation-template.json for at least the first push
check the build/stack folder to ensure its less than 460KB and less than 200 resources in a file.

We did get the deployment working on Serverless, but it was too much work t re-write all the scripts. Amplify Transform has a lot of really good stuff - just falls down in the deploy. But have found the above to give a workable solution.

Hope someone finds it helpful and can spare someone the grief we've been through getting to here.

artista7 commented 5 years ago

Thanks, @sacrampton for the advice. @undefobj - I am also stuck with deployments due to amplify not allowing multiple GSI modifications in one table. Apart from that, cfn script size not more than 460kb and 200 resources limit, all are hindrances in the deployment process. I can't do multiple updates in prod, as suggested as a workaround. I just want one single push to happen.

Regarding GSI issue, is there some solution coming out any time sooner?

kaustavghosh06 commented 5 years ago

@artista7 I'm curious as to what your annotated schema looks like and how many models and connections you're having in your schema.

Stefano1990 commented 5 years ago

I don't understand how people say that amplify is production ready when something as basic as schema "migration" appears to be such a problem.

kaustavghosh06 commented 5 years ago

We've been actively looking into this issue and will have a short term solution out for it in 1-2 weeks. As discussed above, as a short term solution, we'll be adding a sanity check parser to check if any changes to your schema (basically a diff between whatever is deployed to the cloud and the changes in your local schema) would cause a cloudformation push error and fail fast - before the cloudformation push. This check would clearly mention why the sanity check failed and what should be your path forward from there. Here are the cases we've identified for the sanity check so far:

You cannot change an existing GSI for a DynamoDB Table
You cannot add and remove a GSI on a DynamoDB table at the same time
Protect against the duplicate resolvers
You cannot add more than one GSI at a time

Please let us know if we've missed any other use-cases/scenarios.

sacrampton commented 5 years ago

Hi @kaustavghosh06 - the other ones that need to be considered are:

Stack JSON size greater than 460KB (typically ConnectionStack.json for me - minify fixes )
Number of Resources in Stack greater than 200 (and instructions on how to fix)

mikeparisstuff commented 5 years ago

Update: We are adding tooling that will fail prior to the push when you try to push a migration that is known to fail. E.G. changing a GSI, adding and removing a GSI at the same time, changing a key schema, adding LSIs after table creation, etc. We are also adding a suite of tests that cover these migration scenarios that will prevent these issues moving forward. You may track the progress here if interested #1815.

@sacrampton The 200 resource limit has been added. The CLI is considering minifying all JSON by default prior to the push so the build directory can remain easily readable which will require implementing the 460KB check in a different place.

sacrampton commented 5 years ago

That sounds promising @mikeparisstuff.

If you are collecting additional issues, one I forgot to mention was that if I push a new stack which is creating say 30+ tables it will overwhelm DynamoDB and fail. So I have to go and edit cloudformation-template.json to put a "DependsOn" so that each table waits for all previous ones to complete before attempting - then it runs fine. If I'm doing updates after that it does not seem to matter - but initial creation will fail every time unless it has the "DependsOn" switch set.

sacrampton commented 5 years ago

This might not be the right location to ask this question @mikeparisstuff - but if I want to have (for example) 800 resources in the ConnectionStack.json how would I split this up into say 4 stacks? Everything I've tried fails. Are you saying your new work will automatically split that up, or just tell you its going to fail? If its just going to tell you it will fail is there a work around to get it to work and is that documented anywhere?

mikeparisstuff commented 5 years ago

@sacrampton We can add this as a feature request as it is possible to adapt the mapping automatically when a stack like connections gets to big although this is difficult to do in the general case. In the meantime, you are able to unblock yourself using a yet to be documented flag in transform.conf.json. This configuration was added very recently to help the CLI fix an unrelated issue in a backwards compatible way but it is also useful in situations like this. I will be adding documentation on this soon.

The transform uses an internal mechanism calls the "StackMapping" that allows it to convert the global context that is manipulated by the transformer directives into a set of N nested stacks. This allows directive authors to avoid dealing with Import/Export and other oddities of nested stacks as all the Refs/GetAtts are automatically converted into Import/Export/Parameters by the library.

You are able to customize the "StackMapping" by adding a key to transform.conf.json in your project at the same level as your schema.graphql, stacks/ directory, etc. Each key in the "StackMapping" is the resource id of some resource and the value is the name stack that resource should end up in. If you go look at your build/stacks/ConnectionStack.json, you can manually assign some resources to a new connection stack and any refs should automatically be updated.

{
    "StackMapping": {
        "IdOfOverflowingResource": "ConnectionStack2",
        // ... more of these
    }
}

One warning is to be careful when moving resolvers that have already been deployed by one stack into another stack. The safest route is to move undeployed resolvers to the new stack or to first remove the resolvers then move them to the new stack.

kaustavghosh06 commented 5 years ago

Hi everyone, wanted to update on this issue. We got the sanity check/validations working as mentioned out here - https://github.com/aws-amplify/amplify-cli/issues/1406#issuecomment-509418958 and we merged the PR for it (#1815). We'll be publishing a new version of the CLI (1.8.6) early next week with this change.

CodySwannGT commented 5 years ago

Coming in a bit late, and don't want to pile on, but the multiple GSI thing just absolutely kills us.

Based on this thread, I think I read that @connection would be deprecated in favor of @key which would then add support for deploying multiple GSI changes to a single table.

Is this the appropriate thread to be watching for that update or is there a better issue to track?

sacrampton commented 5 years ago

Hey @mikeparisstuff - your suggested workflow above for stack overflows was working great (https://github.com/aws-amplify/amplify-cli/issues/1406#issuecomment-511070208). However, things have stopped working. I'm on the latest amplify (amplify -v = 3.2.0).

The transform.conf.json file is overwritten with a previous version (last successful push) and a new line is added to the end of the JSON as shown (ie. "Version": 4) { "StackMapping": { "IdOfOverflowingResource": "ConnectionStack2", // ... more of these }, "Version": 4 }

Not sure what has changed to have this continuously reverting back to a previous version.

aprilmintacpineda commented 4 years ago

I've been trying to find a way to guard amplify push. If I have DEV, STG, UAT, PROD environments, PROD should not be accessible by everyone. That means, not every should be able do to amplify env remove or even amplify push/publish. Has anyone else done this?

sacrampton commented 4 years ago

I can't remember where I saw this, but one recommendation was that a single database person/team control all pushes to all environments. That if the developers want to change something in the database they make the request of the single change manager controlling pushes to the database. That is what we are doing and its working fine. Someone recommended this approach and it works for me.

plaa commented 4 years ago

I recall that when switching/initing envs Amplify asks which AWS profile you want to use. Just use different AWS profiles for different environment segments. Devs can have a profile with access to the dev envs, while only limited people have access to the prod environment profile. Ideally, the prod environment should be in a completely separate AWS account, and each DevOps their own account + permissions.

I haven't used this setup (in our case everyone on the team was DevOps) but I can't see why this wouldn't work.

CodySwannGT commented 4 years ago

Any more work on supporting multiple GSI deployments? My team works across 8 different amplify projects and this one issue kills us every single iteration.

tgjorgoski commented 4 years ago

@mikeparisstuff, @kaustavghosh06 , to avoid getting issues with multiple GSI because of big refactoring, I created a new env and planned to push there (I'm not worried about losing the data). However even after I did amplify env add, when I try to do amplify push, I get the following error: Attempting to add and remove a global secondary index at the same time on the ProjectTable table in the Project stack. An error occured during the push operation: Attempting to add and remove a global secondary index at the same time on the ProjectTable table in the Project stack.

Might it be that the sanity check is still somehow checking , even if I'm trying to push to new env? Any way I can solve this? (amplify status shows me that I'm on the new env., btw)

UPDATE: Solved it by deleting the #current-cloud-backend folder

nagey commented 4 years ago

@tgjorgoski you can get around this by doing your GSI changes one at a time with an amplify push in between. Also, if you set up a completely fresh env, that should also get you around this issue.

tgjorgoski commented 4 years ago

@nagey , yes, I used to do GSI changes one by one, but because I had lot of modifications now, I just thought I will create completely fresh env (amplify env add). What is confusing though is that it fails with that message on this new env.

CodySwannGT commented 4 years ago

I wish people would stop selling this as a workaround.

It’s not.

In a given iteration, we may add 4 to 7 indexes. Pushing them one at a time requires figuring out which need to be commented out and what connections need to be commented out.

Additionally, because the code is written assuming these indexes exist, the app is broken until all indexes are pushed.

Plain and simple, this is broken and I am stunned the AWS team hasn’t fixed it.

On Sat, Mar 21, 2020 at 12:51 PM Stefan Nagey notifications@github.com wrote:

@tgjorgoski https://github.com/tgjorgoski you can get around this by doing your GSI changes one at a time with an amplify push in between. Also, if you set up a completely fresh env, that should also get you around this issue.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aws-amplify/amplify-cli/issues/1406#issuecomment-602071561, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACHQO635HXDK25532ZGP73RITWBFANCNFSM4HLGLL3Q .

-- Cody Swann - CEO, Gunner Technology T: 213.915.4083 | cody@gunnertech.com | www.gunnertech.com

http://www.gunnertech.com/ Check out our Live Show every Thursday at 1 pm EST on Facebook https://www.facebook.com/gunnertechnology, YouTube https://www.youtube.com/channel/UCjURDi2kurZOJFK1OY-QRHg and Periscope https://www.periscope.tv/gunnertech.

CONFIDENTIALITY NOTICE: This email message, together with any documents, files and/or email messages attached to it, is intended for the sole use of the individual or entity to whom it is addressed, and may contain information that is legally privileged, confidential and restricted from disclosure. If you are not the intended recipient, or responsible for delivery to that person, you are hereby notified that any dissemination or copying of this communication is STRICTLY PROHIBITED. In such case please notify the sender by reply email and delete this message without reading, printing or saving.

tgjorgoski commented 4 years ago

OK, for my problem where even for newly created env I was getting the Attempting to add and remove a global secondary index at the same time issue, I resolved it by deleting the #current-cloud-backend folder.

amuresia commented 4 years ago

Has anybody overcome the challenge of adding secondary indexes? As the functionality of an application grows, so does the @model and adding secondary indexes is an absolute must. amplify push fails fast though stating Local secondary indexes must be created when the table is created. The thing is that 6 months ago when the table was created, some fields did not exist thus needn't a secondary index.

What are people's suggestions for a production environment. Is the only way really just a remove and redeploy (with a data export prior to update and import post update?)

sacrampton commented 4 years ago

Hi @amuresia - I'll have a go at answering this from my experience. I struggled with the problem you are referring to (GSI's). The core of the problem is that the CloudFormation template does not support "DependsOn" for GSI. You can define a DependsOn for the table operations, but not the GSI. I'm not hopeful that is going to get fixed any time soon as it fits between multiple groups at AWS to fix and those sorts of issues seem to take longest to solve (just my observation).

Having said that, we have been working within these limitations quite successfully by seeing the schema.graphql as not a one time view of the system, but as part of an audit trail of the deployment. So we'll have schema.graphql_001, schema.graphql_002, schema.graphql_003, etc. - which is the complete history of the deployment. If you want to recreate from scratch you can have a script that pushes them all in order to recreate the system. As soon as we got our head around the schema.graphql needing to be an audit trail then it all worked pretty smoothly and efficiently.

Also, this is all maintained centrally with a single authority making these changes and pushing to the cloud. If developers want anything added/deleted they request the single authority to make those changes.

We really struggled with the whole GSI issue. We have something like 40+ tables and >100 GSI's and it is changing constantly. As soon as we saw the schema as an audit trail things fell into place. Of course, you only need to have a new version (ie. schema.graphql_007) when your change affects a GSI.

Hope that helps - its all working smoothly for us now.

lukeramsden commented 4 years ago

@sacrampton very interesting as that basically sounds like a recreation of Laravel's (although I'm sure other frameworks may do it) migration system, whereby a list of SQL files are run in order to achieve the final database structure - it's a good system, so maybe it would be prudent of the Amplify team to make this a first-class feature (along with proper versioning, like Laravel). Thank you for this answer, it is very valuable, and I really think the docs should say something along these lines.

aws-amplify / amplify-cli

Enhance Amplify Push Workflow #1406