jamesls commented 2 years ago

(NOTE: This is part of a set of proposals for Chalice 2.0. This proposes a backwards incompatible change that will require a major version bump.)

This is a proposal to remove the built-in (SDK based) deployer and to use either CloudFormation or Terraform (user configurable) when users run "chalice deploy".

Motivation

One of Chalice's key features is decoupling Chalice applications from how they're deployed. This allows you to choose the deployment tooling you prefer, or in some cases, the deployment tooling that your team/org has already adopted. To that end, Chalice supports these deployment backends:

Built-in deployer. This is the deployer used when you run “chalice deploy”. It uses boto3 to make the appropriate calls to Lambda, IAM, etc. to handle the CRUD operations needed for your AWS resources.
SAM packager. If you run "chalice package" this will generate a SAM template that you can then deploy using CloudFormation. Chalice doesn’t actually handle the deployment, instead you can use any existing CloudFormation tool to handle deployment including the AWS CLI, SAM CLI, the web console, or CodePipelines.
Terraform packager. Same thing as the SAM packager, but it generates Terraform templates instead. Chalice also doesn't do the actual deployment, you use the Terraform CLI to handle that.
CDK deployment. Chalice provides a CDK construct that internally reuses the SAM packager and CDK’s CfnInclude to reverse map a SAM template back into CDK constructs. This allows you to deploy your app using the CDK via "cdk deploy". Again, Chalice doesn’t actually handle the deployment for you.

So while there’s four ways of deploying your app, internally there’s really 3 because the CDK integration relies on the SAM packager.

Issues

The downside to this approach is that to implement a new feature, you have to update three deployment backends which requires knowledge of CloudFormation templates, Terraform templates, and Chalice's built-in deployer which has its own intermediate language and op codes that aren't documented. This makes contributions challenging and adds unnecessary friction to adding new features to Chalice. We often see PRs stall because of changes needed to one of the backends. Also, the built-in deployer requires the most effort by far due to it needing to manually handle all the lifecycle updates to CRUD a resource whereas that logic is handled internally by CloudFormation and Terraform.

Additionally, the built-in SDK based deployer was written years ago, and the original motivating forces for the design are not as relevant today. Consider that when Chalice was first released:

SAM didn't exist.
The AWS CDK didn’t exist.
CloudFormation didn't support several resources we needed, and tended to lag behind new feature launches.

We didn't really have any choice but to write our own deployment system, there was nothing else available at the time. Fast forward to today and that doesn't make sense anymore. CFN/CDK/Terraform can handle everything we need. A deployment system is a full time, separate project in its own right and takes away dev time from Chalice framework features.

And the last issue we've run into is that the deployer has prevented us from adding certain features. Not only does the built-in deployer have to manage the lifecycle of every resource we support, but we'd need to do the same thing for every resource the user would want to create outside resources used directly by Chalice. See https://github.com/aws/chalice/issues/516 for an example of this. This proposal becomes more tenable if we only need to provide the appropriate Cfn/Terraform snippet to create resources, or potentially just let users provide their own templates to merge in.

To summarize, with this proposal we gain these benefits:

Minimize the work needed to add new features. You only need to provide the relevant cfn/tf snippets. Increases development speed, makes contributions/reviews easier.
Enable features that were previously not sustainable (e.g. resources).
Resolves issues with the existing deployer (managing deployment state in CI environments, resource lifecycle bugs, etc.)

Specification

I should mention up front that this has to be done in a major version, so I’m proposing a Chalice 2.0.

First, the built-in deployer is removed from the Chalice codebase. This means that any apps deployed with this deployer in Chalice 1.x would need to redeploy their apps using one of the new deployers in order to migrate to 2.x.
Next, Chalice’s internals will be restructured to add a Deployer API. This API is pluggable so that customers can write and integrate their own deployer if needed. The specific API is still TBD but will be added here once that’s figured out. Chalice will add a CloudFormation and a Terraform deployer that implements this new deployer API. The default will be CloudFormation. In terms of implementation, we should investigate if we can reuse the SAM CLI's CFN deployer.
Users can configure which deployer to use in their .chalice/config.json file with the "deployer" option. This is also how users can plug in their own custom deployers. Note that the "chalice package" commands will remain unchanged, so that users with that existing workflow will be unaffected.
Using CloudFormation for deployment will require the use of an S3 source bucket where we can upload both the Lambda deployment packages as well as large CloudFormation templates. Chalice will automatically handle this for you by creating a CFN stack with an S3 bucket if needed. This bucket is reused across all Chalice apps.
Users can configure a specific S3 source bucket to use if they prefer to manage this themselves. If they provide a deployment_bucket config option, Chalice will use this instead of creating a bucket in (4).

The end result is that "chalice deploy" will now internally generate a template and then kick off the deployment with either CFN/terraform.

Feel free to share any thoughts/feedback you have on this. This is still in the early stages so nothing's finalized yet.

kapilt commented 2 years ago

This sounds great.

There’s a lot of code in the built in deployer that feels like dead weight when adding new capabilities. One additional, the experience on the built in deployer is subpar when handling updates (it can’t do differential/partial). I’d hazard a guess this allows to drop 20-30% of the extant code in chalice.

The cdk deployment here also feels more like lip service, to say it integrates rather than providing utility. I’d suggest that facility be moved to a separate extensions/example repo as a demonstration or perhaps even reduced to documentation.

kapilt commented 2 years ago

a couple other thoughts re v2

source structure - it would be a good time to expand out the module in chalice, currently doing prs to chalice is somewhat tedious as the use of multi thousand line large modules, means inevitable conflicts with other prs even for relatively trivial functionality which increases the burden of contribution and maintainership.

remove / separate experiments - there are a number of capabiltiies in chalice that are feel like demo driven and not really intended or safe for real world usage by themselves, cdk integration, pipelines, iam gen to name a few. imo, it would be good to move those to separate tools within the repo and document them as such. as part of doing that having core chalice adopt a more eventing/plugin pattern might be helpful so that those features and others can be utilized.

governance - its unclear how active chalice dev is, significant functionality like #1505 has been languishing for over a year on relatively trivial pr comments (use a different http client) despite significant improvements to the development experience. most prs can expect to wait months before getting feedback, if any. most of the prs from the past year are simple fixes, vs features, etc. in talking to the lambda team at reinvent, the comments on chalice vs sam were also fairly negative, effectively connotating this project is effectively dead. given the minimal maintenance (a few hrs a month) from aws employees, and lack of external committers, its unclear if its safe/worth recommending chalice to new users. i think its worth a consideration of moving the project to a github org where external maintainers can be utilized.

anyways, at the moment i find myself debating what to do about existing apps that use chalice, whether to migrate them to sam or fork chalice.

jamesls commented 2 years ago

Thanks for the feedback @kapilt. In general I agree with the general ideas you're proposing. There were a lot of ideas in Chalice that we now have a better idea of what makes sense and what doesn't.

I'm hoping to plan out more v2 ideas this month and get a better sense of the overall scope.

devangmehta123 commented 2 years ago

@jamesls , I am waiting a bit on the future direction of Chalice. I have invested into it but would like to see some action. Thanks.

aws / chalice

[v2] Replace the built-in deployer #1833

Motivation

Issues

Specification