Cloud Assemblies and the App Model

eladb commented 6 years ago

As we progress in our design for an end-to-end developer experience for cloud applications, and start to venture into more complex use cases, there's a common pattern that keeps emerging, where more than a single artifact is needed in order to deploy cloud applications. Moreover, these artifacts interact with each other.

Some examples:

Runtime code: a zip file with a Lambda handler, a Docker image for a container-based app, an AMI for an EC2 fleet, etc.
Nested stacks: nested stacks are CloudFormation resources that point to other templates and deploy them in a single transaction.
Apps with multiple, related stacks: real-world cloud apps normally consist of more than a single stack, and there are cross references between these stacks.
Phased deployments: we know that there are use cases where deployments of a stack must happen in phases (for example, data migration might require deploying a stack that performs the migration and only after it's done, deploy a stack that removes the old resources). Other examples are limitations in CloudFormation such as #201.

In all of these cases, we still want the developer and CI/CD workflows to operate at the app level. What does it mean? It means that if a construct defines a Lambda function that uses some runtime code, when a developer invokes cdk deploy, the runtime code should be uploaded to an S3 bucket and the CloudFormation template should reference the correct bucket/key (and also make sure Lambda has permissions to read from the bucket). This should also seamlessly work within a CI/CD pipeline.

Since the CDK's unit of reuse is a construct library, any solution must not look at the problem from the app's perspective, but from a construct perspective. It should be possible for a construct library to encapsulate runtime code, nested stacks, etc. Then, when this library is consumed, the workflow above should continue to work exactly in the same way.

Design approach

At the moment, synthesis is actually performed at the App level and is tailored to produce a single artifact (CloudFormation template). The proposed design will allow any construct in the tree to participate in the synthesis process and emit arbitrary artifacts.

But it is not sufficient to just emit multiple artifacts. We need to model their interaction somehow (dependencies, data-flow, how do these artifact interact with cloud resources, etc).

Generalizing this, we effective need to have some way to describe the model for our app. If a CloudFormation template define the model for a single stack, we need a way to describe an entire cloud application.

Naturally, we should prefer a desired state configuration approach where the app model doesn't describe steps but rather the desired state, and then tools (like the CDK toolkit or CI/CD systems) can help achieve this desired state.

Let's say that the toolkit only knows how to work with app model files, which describe the desired state of an app in a format similar to CloudFormation templates:

{
  "Resources": {
    "MyLambdaCodePackage": {
      "Type": "AWS::App::Asset",
      "Properties": {
        "File": "./my-handler.zip"
      }
    },
    "MyTemplate": {
      "Type": "AWS::App::Asset",
      "Properties": {
        "File": "./my-template.json"
      }
    },
    "MyStack": {
      "Type": "AWS::App::CloudFormationStack",
      "Proeprties": {
        "TemplateURL": { "Fn::GetAtt": [ "MyTemplate", "URL" ] },
        "Parameters": {
          "MyLambdaCodeS3Bucket": { "Fn::GetAtt": [ "MyLambdaCodePackage", "Bucket" ] },
          "MyLambdaCodeS3Key": { "Fn::GetAtt": [ "MyLambdaCodePackage", "Key" ] }
        }
      }
    }
  }
}

This is not a CloudFormation template! It's an App Model file. It uses the same structure to define the desired state of an entire application. This file, together with all the artifacts synthesized from the app form a self-contained cloud app package ("cloud executable"?).

When tools read this file, they can produce a deployment plan for this app:

Upload the files ./my-handler.zip and ./my-template.json to an S3 bucket.
Deploy a CloudFormation stack. When executing the CreateStack API, use the S3 URL to specify the template URL and pass in parameters that resolve to the location of the S3 bucket and key of the Lambda runtime archive.

The power of this approach is that it is highly extensible. Anyone can implement App Resources which will participate in this process. The desired state approach deems that each resource needs to be able to be CREATED, UPDATED or DELETED, and also DIFFed against the desired state.

Implementation

Synthesis

Each construct in the tree may implement a method synthesize(workdir) which will be called during synthesis. Constructs can emit files (or symlinks) into a working directory at this time.

App Model Resources

Similarly to CloudFormation Resources, we can defined constructs that represent app model resources (AppStack, AppAsset). Similarly to how CloudFormation resources are implemented, these constructs will implement a method toAppModel() which will return an app model JSON chunk to be merged into the complete app model.

The App construct is now a simple construct, in it's synthesis() method, it will collect all app resources from the tree, merge them together and emit an app-model.json (or index.json) into the working directory.

The toolkit will expect to always find an index.json file in the working directory. It will read the file and form a deployment plan (calculate dependency graph based on references). Then, it will execute the deployment plan.

The toolkit can either deploy the entire app or only a subset of the resources, in which case it can also deploy any dependencies of this set.

Each app resource will have a "provider" implemented in the toolkit via an extensible framework. Providers will implement the following operations:

Diff(desired-state) the desired state against the current deployed state
Create/Update(desired-state) the resource to reach the desired state (idempotent)
Delete the resource
GetAtt(name) return the runtime value from the deployed resource (i.e. the actual bucket name)

In the normal flow, the toolkit will simply invoke the create/update operation in topological order (and concurrently if possible). It will supply the contents of the Properties object which represents the desired state. If a property includes a reference to another resource (via Fn::GetAtt), it will replace the token with the value from the other resource.

TODO

List of use cases we should tackle for this new design:

[ ] Transactionality: what happens if a deployment task fails? Should/can we roll it back? I guess it depends on the boundaries of a transaction. Perhaps we should make that part of the solution and for certain tasks (like S3), just "no-op". For tasks that support it, we can trigger an actual rollback if the entire transaction fails.
[ ] Environments: how do environments play here? A stack (and also an S3 object) should be bound to an environment.
[ ] Cross stack refs: how do we implement cross stack/cross env references in this model?
[ ] Docker images
[ ] Runtime values (allowing runtime code to use resolved attributes of infrastructure resources)
[ ] Environmental context
[ ] Construct metadata - where does it go now?
[ ] Vending runtime client libraries for constructs: it should be possible to supply reusable runtime code for a construct. This could be as simple as an API client generated from Swagger definitions or more complex as a Lambda handler base class which requires you to implement a bunch of methods and does magic for you. At any rate, this is something that needs to be somehow cross-language.
[ ] Packaging (see @kiiadi comment below)

rix0rrr commented 6 years ago

I think I also want Constructs to be able to spread over multiple stacks. And i don't necessarily mean nested stacks.

eladb commented 6 years ago

Okay, this is starting to fall into place. Seems like we can generalize this even more and finally define the app model.

No need to store a revision in comments, GitHub has revisions on the issue description...

eladb commented 6 years ago

Introduced the concept of an app model which is essentially a desired state description of a full app.

eladb commented 6 years ago

What about our "toolkit stack" (the stack that contains the bucket which we upload artifacts to)? Can we extend the app model to include it as well?

Let's try:

Using yaml and ${} substitutions for brevity

Resources:
  AssetsStack:
    Type: AWS::CloudFormation::Stack
    Properties:
      # inline version of the "toolkit stack" template, now it's not "special" anymore
      TemplateFile: "./assets-template.json"
  MyLambdaCodePackage:
    Type: AWS::S3::Object
    Properties:
      File: "./my-handler.zip"
      Bucket: ${AssetsStack.AssetsBucket}
  MyTemplate:
    Type: AWS::S3::Object
    Properties:
      File: "./my-template.json"
      Bucket: ${AssetsStack.AssetsBucket}
  MyStack:
    Type: AWS::CloudFormation::Stack
    Proeprties:
      TemplateURL: ${MyTemplate.URL}
      Parameters:
        MyLambdaCodeS3Bucket: ${MyLambdaCodePackage.Bucket}
        MyLambdaCodeS3Key: ${MyLambdaCodePackage.Key}

This requires that AWS::CloudFormation::Stack can also accept an template from a local file (instead of a URL), this limits it to 50KB, which is perfectly suited for this purpose.

buffovich commented 6 years ago

A few humble comments (James Joyce's Stream-Of-Conciseness style):

Keep an eye on re-usability with different combination of clouds/artifacts. I do understand that the primary focus is CFN now, but it wouldn't hurt. "Toolkit stack" is very CFN-specific.
I noticed that there is no clear line on sand between deployment orchestration and templating engine. Should it be there? Or it is already there and I'm missing something.
It would be great to have high-level logical dependencies decoupled from what they actually mean on compiled artifacts level. For instance, it makes perfect sense to have dependency of CloudWatch monitor on metric generation section in your metric agent config and have it trackable in the template. Should it be reflected somehow on build artifact level? That's not for the model level to decide.
In our own home-grown templates https://github.com/awslabs/cloud-templates-ruby we are coming to conclusion that we need to have some graph framework to make really complex definitions. For instance: "Do this if there is a dependency of this type on me", "Find all children which ...". We do kinda sorta have rudimentary Graph framework implementation in the library but we are already hitting limitations of the approach.

ó Éirinn le craic

eladb commented 6 years ago

Keep an eye on re-usability with different combination of clouds/artifacts. I do understand that the primary focus is CFN now, but it wouldn't hurt. "Toolkit stack" is very CFN-specific.

That's a good point. I think we will need to use higher level abstractions for the app model also to allow tools to reach the desired state in different ways. For example, the assets bucket (e.g. where runtime code or CloudFormation templates or deployment artifacts are consumed) can either be a bucket created by the toolkit when used for development or the pipeline artifacts bucket in CI/CD. Either way, the desired state of an "asset store" with a bunch of specific artifacts is reached.

So let's see what types of "app resources" we need for the app model:

Stack - infrastructure deployment unit, limited to a single environment (account/region)
Asset - a cloud-stored file that is consumed by stacks, also exists within a single environment.

Would it be useful to model the "Asset Store"? Probably not really needed. An asset by definition is stored somewhere. The details of how/where/who defines the asset store can be left undefined in the model, and implemented differently by different tools.

So now, the stack + runtime code example will look like this:

Resources:
  MyLambdaCodePackage:
    Type: AWS::App::Asset
    Properties:
      File: "my-handler.zip"
  MyTemplate:
    Type: AWS::App::Asset
    Properties:
      File: "my-template.json"
  MyStack:
    Type: AWS::App::Stack
    Properties:
      TemplateURL: ${MyTemplate.URL}
      Parameters:
        MyLambdaCodeS3Bucket: ${MyLambdaCodePackage.Bucket}
        MyLambdaCodeS3Key: ${MyLambdaCodePackage.Key}

rix0rrr commented 6 years ago

AWS::S3::Object

In order to not confuse people (and ourselves) too much, how about we change the namespace of our app-level resources? CDK::S3::Object ?

The toolkit will expect to always find an index.json file in the working directory.

Don't understand why we have to go through the filesystem all of a sudden. Can we not do the same thing as before with stdout?

It will read the file and form a deployment plan (calculate dependency graph based on references)

We will still need to specify a target in some way. The whole "app model" is going to contain the entire application in all stages.

I think I would like to introduce the concept of an App in there (distinct from our current App class which doesn't really model anything but is there as an implementation detail), which is a thing that has a name that makes sense to deploy in one go. It might consist of one or more Stacks which each depend on one or more resources, etc.

That is not to say you couldn't just address the S3::Object operation as a target if you wanted to, but that's not going to be the common use case. Nor is it going to be the common use case to just deploy everything in a CDK app.

Cross stack refs: how do we implement cross stack/cross env references in this model?

They will have to be Parameters, just like file uploads.

There's a limit of 50 or 60 parameters. We might run into that for complicated apps.

Docker images

Local docker build or whatever the command is again, push the image to ECR, return the ARN.

Apropos of nothing: I still wonder whether it makes sense for customers to define Stacks on their own. Shouldn't they be defining Apps instead (new-style App mentioned above, not our current one), which is then sliced into Stacks by our runtime system in a way that makes most sense?

The rules are not that complicated:

As many resources as possible with the same (account, region) per Stack.
If this introduces a cycle, move the the object that breaks the cycle to a second Stack (hand-waving on how we detect which object that is, but probably doable).
If this introduces cross-region/cross-account references, make the publishing stack publish x-references to SSM parameters, introduce a Custom Resource in the consuming stack that reads the SSM parameters.

rix0rrr commented 6 years ago

I like it! It generalizes nicely!

We'll have to build a copy of CloudFormation's evaluation engine, but I guess it won't be too complicated?

Critical questions:

Does it make sense to model this in CFN templates? Is that just because we're familiar with them, or are they really the best choice? No strong opinion yet, just asking.
If we do this, can we at least simplify? For example, no { Ref }, only { Fn::GetAtt } please! :)
Are we going to implement all { Fn::Whatever } functions as well? Or only the ones we need?

rix0rrr commented 6 years ago

The more I think about Stacks the more I think they're an implementation detail we shouldn't care about: the system should just take care of them.

The biggest downside to this is that people migrating over from CloudFormation, who want to recreate their existing templates in CDK, are going to want strict control over them.

I think we should give them that control without putting the concept of a Stack front and center.

kiiadi commented 6 years ago

Nice!

@rix0rrr

CDK::S3::Object

+1 - this makes it much clearer that these aren't specific AWS resources!

@eladb et al:

How does the packaging story play in here? As part of my normal development workflow I usually want to 'package' my app and deploy it in quick succession. Then again as part of my CI/CD pipeline I need to be able to build my application before I deploy it. Is the idea that packaging (e.g. compilation, docker build etc) is still done as a separate step or can I define that behaviour in the app-model also?

Could I define a construct that does the packaging for my application (with common implementations)? Then the synthesize method is simply the invocation of that packaging system (if that's the case synthesize feels a little wrong as the verb). In this world the CDK toolkit is the entry-point to the process and owns the orchestration.

The other option is to reverse this, I suspect that many customers are going to want to be able to invoke CDK as part of their 'regular' build-tool. As a JVM developer I just want to be able to do gradle deploy which could package and deploy my application - in this model the build tool owns the orchestration and handles the packaging before CDK/app-model gets involved.

Thoughts?

eladb commented 6 years ago

@rix0rrr wrote: The more I think about Stacks the more I think they're an implementation detail we shouldn't care about: the system should just take care of them.

You are right. Stacks as an abstraction are not really important, and perhaps at some point we can get rid of them when you define apps at a high level. Still, they are a mechanism that apps can use to isolate regions/account and updates, and therefore they make sense at the app model layer I believe.

eladb commented 6 years ago

@kiiadi, agreed about AWS::S3::Object. We'll use AWS::App::Asset and AWS::App::Stack.

Regarding packaging - that's a good point. At the moment, the model is that packaging is done by idiomatic tools and consumed as assets by the app model, but I am not sure that's good enough. We need to think about it further. I think the idea of a construct performing a build is very interesting (and I am not sure that the "synthesis" terminology breaks, i.e. "this construct represents your Lambda's handler, and in order to synthesize it you need to build the code"). But I agree about IDEs and native tools. I am not sure it's a good idea for us to go to the software build business if we can avoid it and let people use their normal tools and IDEs.

eladb commented 6 years ago

@rix0rrr, instead of AWS::S3::Object we will use AWS::App::Asset. The distinction between an asset and an S3 object is that an asset implies an "asset store", which can have different implementations when running locally or via CI/CD.

Don't understand why we have to go through the filesystem all of a sudden. Can we not do the same thing as before with stdout?

Since we are going to want to emit multiple artifacts, which can potentially include GiBs of runtime code, I believe we should emit those directly into the filesystem. By the way, in many cases (e.g. those GiBs of runtime code), the construct can just emit a symlink to the actual code instead. If we go down the STDOUT path design, we will eventually define a filesystem protocol, which is not our business.

We will still need to specify a target in some way. The whole "app model" is going to contain the entire application in all stages.

You mean in the case where you only want to deploy a single resource from the app model (i.e. a dev stack). Yes, that should be possible. In this case, the toolkit should be able to deploy all it's dependencies as well. Added to the design.

I think I would like to introduce the concept of an App in there (distinct from our current App class which doesn't really model anything but is there as an implementation detail), which is a thing that has a name that makes sense to deploy in one go. It might consist of one or more Stacks which each depend on one or more resources, etc.

That's a good idea. Maybe just AWS::App::ResourceGroup?

Cross stack refs: how do we implement cross stack/cross env references in this model? They will have to be Parameters, just like file uploads. There's a limit of 50 or 60 parameters. We might run into that for complicated apps.

I will add to the doc. It's in the list. It will be awesome (🤞)

Docker images Local docker build or whatever the command is again, push the image to ECR, return the ARN.

Also in the TODO list.

Apropos of nothing: I still wonder whether it makes sense for customers to define Stacks on their own.

As mentioned above, I think that's a good idea, but not at the app model level. These are abstractions we should easily implement in the CDK itself. Makes sense?

buffovich commented 6 years ago

Hey guys,

Can we try to tackle not-just-AWS case for the format? I just want to avoid ECS-EKS situation. And we can be pioneers for this initiative in the industry.

buffovich commented 6 years ago

Addendum: Do you guys think it would make sense to factor-out deployment system from templating one? Going to the extreme that two can be written in different languages. For instance, cloud assembly deployment toolkit can be written in Go/Rust (because it's cool and new and compiled and bundled and self-sufficient ...) and templating system can be CDK, Cloud-templates, Troposphere, GoFormation, or something internal to a company who wants to use the format.

That would be really cool.

tvb commented 6 years ago

The more I think about Stacks the more I think they're an implementation detail we shouldn't care about: the system should just take care of them.

The biggest downside to this is that people migrating over from CloudFormation, who want to recreate their existing templates in CDK, are going to want strict control over them.

I think we should give them that control without putting the concept of a Stack front and center.

This is partly true. I believe structuring the code per "stacks" and naming them accordingly is actually a good practice.

something like the following:

stacks/
  s3_stack/
    bucket.ts
  iam_stack/
    user.ts
    policy.ts

aws / aws-cdk