aws / chalice

Python Serverless Microframework for AWS
Apache License 2.0
10.67k stars 1.01k forks source link

[proposal] AWS Managed Resouces for Chalice #516

Open kyleknap opened 7 years ago

kyleknap commented 7 years ago

Abstract

This issue proposes a mechanism for managing additional AWS resources that may be relied on in a serverless Chalice application. In terms of management, it will handle creates, updates, and deletion of the resources upon deployment of the application and allow users to easily interact with these resources within their application. The only AWS resouce that will be added in this proposal is a DynamoDB table, but the mechanism should be able to support any future AWS resource.

Motivation

Currently, Chalice does no management of any AWS resources that are part of the core buisness logic of a Chalice application but are not part of the Chalice decorator interface (i.e. app.lambda_function(), app.route(), etc.). There are a lot of AWS resources an application may rely on such as an S3 bucket or a DynamoDB Table. However, these resources must be created out of band of the actual Chalice deployment process which is inconvenient because:

Therefore, it is a much more friendly user story if Chalice handles the deployment of these resources for the user. Also since Chalice did the deployment, it can easily provide references to those deployed resources from within the Chalice application.

Specification

This section will go into detail about the interfaces for adding these managed resources, code samples of how users will interact with the interface, and the deployment logic in deploying these managed resources. Since DynamoDB tables is the only resource this proposal is suggesting to add, this section will be specific to DynamoDB tables.

To have Chalice manage AWS resources, users will first have to declare their resources in code via resources.py file and then may configure these resources using the Chalice config file.

Code Interface

The top level interface into these managed resources is a resources.py file. This is used for declaring all additional AWS resources to be managed by managed by Chalice in an application. The resources.py file will live alongside the app.py file in a Chalice application:

myapp$ tree .
.
|-- app.py
|-- requirements.txt
|-- resources.py

Inside of the resources.py is where the various managed AWS resources are declared and registered to the application. To better explain how the resources.py file works, here is an example of the contents of the file:

from chalice.resources.dynamodb import Table

def register_resources(app):
    app.resource(MyTable)

class MyTable(Table):
    name = 'mytable'
    key_schema = [
        {
            'AttributeName': 'username',
            'KeyType': 'HASH'
        },
        {
            'AttributeName': 'rank',
            'KeyType': 'RANGE'
        }
    ]
    attribute_definitions = [
        {
            'AttributeName': 'username',
            'AttributeType': 'S'
        },
        {
            'AttributeName': 'rank',
            'AttributeType': 'N'
        }
    ]
    provisioned_throughput = {
        'ReadCapacityUnits': 20,
        'WriteCapacityUnits': 10
    }

The resources.py file requires a module level register_resource() function to include any additional resources for Chalice to manage for the application. The register_resources() function only accepts an app object representing the Chalice application. Within the register_reources() function, users must use the app.resource() method to include the resource in the application. Currently, the app.resource() will only allow one argument being the resource class to be registered. Furthermore, all resources registered must have a unique logical name. The logical name for a Chalice resource is the either the class name of the resource or the value of the name property of a Chalice resource class.

To actually declare a managed resource, users must first import the appropriate resource class from the chalice.resources.<service-name> module. Then they must subclass from the desired resource class and provide the appropriate class properties to configure the resource.

In the original example, the user first imports the Table class to use to declare a DynamoDB table for their Chalice application. The user then creates a new class MyTable from the Table class to flush out the properties of the DynamoDB table they want. As it relates to the configurable class properties of a DynamoDB table, they are as follows:

With the resources.py fully flushed out Chalice will then deploy all of the resources registered to the application in the register_resources() function.

The resources then can be accessed from within the Chalice application. With the addition of the resources.py file, the chalice.Chalice app object will be updated to include a resources property.

class Chalice(object):
    ...
    self.resources = Resources()

The resources property serves as a way of referencing values for deployed resources.

The Resources() class interface will be the following:

class Resources(object):
    def get_service(self, resource_name):
        # type: (str) -> str

    def get_resource_type(self, resource_name):
        # type: (str) -> str

    def get_deployed_values(self, resource_name):
        # type: (str) -> Dict[str, Any]

For the Resources class, its methods are the following:

To interact the with the deployed resources in the application, refer to the previous resources.py and the following app.py:

from chalice import Chalice
import boto3

app = Chalice(app_name='myapp')
dynamodb = boto3.resource('dynamodb')

@app.route('/users/{username}')
def get_user(username):
    deployed_table_name = app.resources.get_deployed_values(
        'mytable')['TableName']
    table = dynamodb.Table(deployed_table_name)
    response = table.get_item(Key={'username': username})
    return response['Item']

In the above example, the application was able to retrieve the name of the deployed DynamoDB table by calling the get_deployed_values() method.

Furthermore, if a user wants to programatically create a client or resource object for a particular deployed resource. Respectively, users could write the following helper functions:

def get_boto3_client(resource_name):
    return boto3.client(app.resources.get_service_name(resource_name))

def get_boto3_resource(resource_name, *resource_identifiers):
    return getattr(
        boto3.resource(app.resources.get_service_name(resource_name)),
        app.resources.get_resource_type(resource_name))(*resource_identifiers)

Config Interface

In the case a user wants the configuration options to vary by stage, users can specify configuration of the managed resources through the Chalice config file. To configure a resource, the user would need to specify the following general configuration:

"resources": {
  "<logical-resource-name>": {
    "<option-name>": "<option-value>"
  }
}

As it relates to DynamoDB tables, the only options available will be configuring the provisioned capacity. Below is a sample configuration for the previously provided DynamoDB table in the resources.py:

"resources": {
  "mytable": {
    "provisioned_throughput": {
       "ReadCapacityUnits": 50,
       "WriteCapacityUnits": 10
    }
  }
}

For DynamoDB tables, only read capacity and write capacity can be specified.

The "resources" configuration can be specified at a top level key and per stage basis where the values in the stage completely replace any that exist in the top level key configuration. In addition, any resource specific configuration provided in the config file will replace whatever values that were specified in code.

For example, take the following defined table in the resource.py file:

from chalice.resources.dynamodb import Table

def register_resources(app):
    app.resource(MyTable)

class MyTable(Table):
    name = 'mytable'
    key_schema = [
        {
            'AttributeName': 'username',
            'KeyType': 'HASH'
        },
    ]
    attribute_definitions = [
        {
            'AttributeName': 'username',
            'AttributeType': 'S'
        },
    ]
    provisioned_throughput = {
        'ReadCapacityUnits': 5,
        'WriteCapacityUnits': 5
    }

With the following Chalice config file:

{
  "version": "2.0",
  "app_name": "myapp",
  "resources": {
    "mytable": {
      "provisioned_throughput": {
        "ReadCapacityUnits": 10,
        "WriteCapacityUnits": 10
      }
    }
  },
  "stages": {
    "dev": {},
    "prod": {
      "resources": {
        "mytable": {
          "provisioned_throughput": {
            "ReadCapacityUnits": 100,
            "WriteCapacityUnits": 20
          }
        }
      }
    }
  }
}

With this Chalice config file, the mytable table for each stage will have the following configuration values:

However if there is was no top-level "resources" key in the config file, the dev stage will use the values specified in the resources.py file, which were read capacity of 5 and write capacity of 5.

Deployment Logic

In terms of deployment logic, both chalice deploy and chalice package will be supported.

When it comes to the chalice deploy command, Chalice will look at all of the resources created under the Chalice.resources property and individually deploy and make any updates to the resource using the service's API directly. It is important to note that if the user changes the logical Chalice name of the resource, it will be deleted on Chalice redeploys.

Once deployed, it will save all of the deployed resources under the "resources" key in the deployed.json whose value will be a dictionary that contains values related to the various managed resources. The format of the dictionary will be as follows:

{
    "<logical-resource-name>": {
       "service": "<service-name>",
       "resource_type: "<resource-type>",
       "properties": {
           ... various idenitfiers and properties of the resource...
       }
    }
}

As it relates to the specific keys:

Taking the previous Table example, the value of the "resources" key will look like the following in the deployed.json:

{
    "mytable": {
        "service": "dynamodb",
        "resource_type": "Table",
        "properties": {
            "TableName": "myapp-dev-mytable"
        }
}

Making the entire deployed.json look like the following:

{
  "dev": {
    "api_handler_name": "chalice-trivia-dev",
    "api_handler_arn": "arn:aws:lambda:us-west-2:934212987125:function:myapp-dev",
    "resources": {
      "mytable": {
        "service": "dynamodb",
        "resource_type": "Table",
        "properties": {
            "TableName": "myapp-dev-mytable"
         }
      }
    },
    "lambda_functions": {},
    "backend": "api",
    "chalice_version": "1.0.0b1",
    "rest_api_id": "448qxrx2vj",
    "api_gateway_stage": "dev",
    "region": "us-west-2"
  }
}

For the chalice package command, it will take the resources in the application and add it to the CloudFromation template. The generated CloudFormation template will use the AWS::DynamoDB::Table resource type to create the DynamoDB resource.

It is also important to note the actual name of the DynamoDB table that will be created for both deployment methods will be "<app-name>-<stage-name>-<logical-table-name>". So if in the application, the user adds a table to the application "myapp" called "mytable", the deployed table will be called "myapp-dev-mytable-dev" when deployed to the "dev" stage.

Rationale/FAQ

Q: How do you imagine the interface will grow for future resources?

A lot of the future managed resources will be able to follow the same pattern of the DynamoDB table resource. In general to add support for a new resource, the following changes will be needed:

To get a better understanding of potential future interfaces, here are some rough sketches for future AWS resouces.

S3 Bucket

Here is some sample applications on how a user may rely on Chalice to manage and interact with their S3 bucket:

The resources.py file would be the following:

# Example of making thumbnails from a source bucket:
# http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
from chalice.resources.s3 import Bucket

def register_resources(app):
    app.resource(SourceBucket)
    app.resource(TargetBucket)

class SourceBucket(Bucket)
    pass

class TargetBucket(Bucket)
    pass

Then the app.py would be the following:

# Example of making thumbnails from a source bucket:
# http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
import io

from chalice import Chalice
import boto3

app = Chalice(app_name='myapp')

s3 = boto3.resource('s3')
source_bucket = s3.Bucket(
    app.resources.get_deployed_values('SourceBucket')['Bucket'])
target_bucket = s3.Bucket(
    app.resources.get_deployed_values('TargetBucket')['Bucket'])

# This is just an example of how an S3 event may look in the future.
# There is no guarantees on this interface.
@app.s3_event(source_bucket.name, event_type='ObjectCreated')
def save_thumbnail(event, context):
    image_stream = io.BytesIO()
    key = event['Records'][0]['s3']['object']['key']
    source_bucket.download_fileobj(key, image_stream)
    resized_image_stream = resize_image(stream)
    target_bucket.upload_fileobj(resized_image_stream, key)

SNS Topic

Here is a sample use of managing and interacting with an SNS topic:

The resources.py would be:

from chalice.resources.sns import Topic

def register_resources(app):
    app.resource(MyTopic)

class MyTopic(Topic)
    pass

And then in the app.py, users could publish messages to this SNS topic:

import json

from chalice import Chalice
import boto3

app = Chalice(app_name='myapp')
sns = boto3.client('sns')

@app.lambda_function()
def publish(event, context):
    arn = app.resources.get_deployed_values('MyTopic')['TopicArn']
    # Publish the message provided to the route.
    sns.publish(TopicArn=arn, Message=json.dumps(event))

Q: Why have users to specify the resources in code (instead of a config file)?

It is a much more intuitive and user friendly interface. The other option would be the user specifies it in some configuration file and the resource would be automatically created and can start being used in the lambda function. So something like:

from chalice import Chalice

import boto3

app = Chalice(app_name='myapp')
dynamodb = boto3.resource('dynamodb')

@app.route('/users/{username}')
def get_user(username):
    # Note: There is no code that actually adds the resource
    table = app.resources.get_deployed_values('mytable')
    response = table.get_item(Key={'username': username})
    return response['Item']

The problems with this approach are the following:

Q: Why separate the resources into a resources.py file?

The main reason is that from a user's perspective it adds a nice layer of separation from the core logic in the app.py and the additional AWS resources they may require. Putting all of the declaration of resources in the app.py file makes the app.py file bloated especially if the user has a lot of resources. Furthermore, the resource decalartion is only really needed for the deployment of the application, thus it does not makes sense to have these classes in the runtime when the classes are not going to be used directly by the application's core logic.

Q: Why have specific classes for each resource type instead of a general Resource class?

Both the resources.py and the chalice.resources package will not be included in the deployment package. Then since the resources are not included in the deployment package, the number of resources is not constrained by deployment performance or Lambda package size. By having a specific class for each resource, it allows for:

Q: Why have users subclass from a resource class and then define class properties instead of having them instantiate the class directly?

This was chosen for a couple of reasons:

Q: Why can't resources managed by a Chalice application share the same logical name in a Chalice application?

It is a combination of making it easier to interact with the resources declared in the resources.py and there being a strong reason for wanting the ability to share the same logical name in Chalice. Specifically:

Q: What if users require further configuration (i.e. secondary indexes)?

That would not be currently supported. We would need to expose deployment hooks or add the class property to the base class. However it may be possible in the future for users to define their own resource classes and register their custom resources to their application.

Future Work

This section talks about ideas that could be potentially pursued in the future but will not be addressed in this initial implementation.

Custom Resource Classes

This idea would enable users to define their own resources that can be managed by Chalice. The purpose of allowing this would be if a user wants Chalice to manage a resource that currently does not have first class support or maybe there is additional logic they want to add on an existing resource type. In order to support this, the general resource interface will need to be solidified and figure out how users would be able to plumb in deployment logic for that resource.

Simplified Resource Classes

This idea would allow users to specify resources that have a simplified configuration. The purpose of adding these is to help users that are either new to AWS or users that do not necessarily need all of the different resource parameters. This is ultimately done by reducing and simplifying the configuration parameters a user would have to specify in a class. Potential resource classes include: the serverless SimpleTable and an S3 bucket (if an S3 bucket resource gets exposed) that exposes configuration parameter solely for the purpose of hosting content for a website.

aalvrz commented 6 years ago

This looks awesome. Any idea when this might be supported?

jamesls commented 6 years ago

This is pending the work on https://github.com/aws/chalice/issues/604, which essentially makes the deployer code less API Gateway/Lambda specific. This is going to make it easier to support new resource types. I'm actively working on #604 but don't have a concrete ETA.

aalvrz commented 6 years ago

Now that the new deployer has been finished and merged into master, what would be the starting point to begin adding functionalities described in this issue? I would love to start contributing on being able to add necessary resources for a Chalice app.

kadrach commented 6 years ago

SQS triggers are showing up in what I assume is a soft-release in the console and in botocore, hopefully we can use these in Chalice soon! :)

vbloise3 commented 5 years ago

When do you think the functionality described in this issue will be completed? Would love to be able to manage my s3 buckets and dynamodb tables within my chalice code.

kyleknap commented 5 years ago

We still don't have an ETA on when the implementation will be complete. I have done some work to get a rough POC together, but I also made changes to the design that I originally wrote. So I would also like to get a draft together of those proposed changes before getting a formal PR. However because we have support for experimental features in chalice, it should be a lot easier to add as this would likely be an experimental feature.

chkothe commented 5 years ago

Could one have some sort of simpler intermediate solution to address the points raised in the Motivation (one that still makes sense / provides value after this feature is done)? E.g., allowing the user to specify a cloudformation include file that chalice package would automatically pull in? Or does it make sense to allow chalice to generate parametric stacks that could be nested in some bigger stack that has extra resources like the DB etc?

I haven't used either of those CF features, but maybe there's some way to get low-level, but full, coverage across other AWS resources types that way -- if nothing else to get everything deployed/updated together with 1 or 2 CLI commands, but perhaps even with some uni- or bidirectional ability to reference by name between chalice and non-chalice resources.