kyleknap commented 7 years ago

Abstract

This issue proposes a mechanism for managing additional AWS resources that may be relied on in a serverless Chalice application. In terms of management, it will handle creates, updates, and deletion of the resources upon deployment of the application and allow users to easily interact with these resources within their application. The only AWS resouce that will be added in this proposal is a DynamoDB table, but the mechanism should be able to support any future AWS resource.

Motivation

Currently, Chalice does no management of any AWS resources that are part of the core buisness logic of a Chalice application but are not part of the Chalice decorator interface (i.e. app.lambda_function(), app.route(), etc.). There are a lot of AWS resources an application may rely on such as an S3 bucket or a DynamoDB Table. However, these resources must be created out of band of the actual Chalice deployment process which is inconvenient because:

There is no way to hook into the chalice deploy command. So users have to deploy the resources themself outside of chalice deploy by manually deploying the resources, relying on out of band deployment scripts, etc.
If users are using CloudFormation and chalice package, they will need to modify the CloudFormation template to add the resources you need.
Users will need to manage the deployment of resources per stage if they are looking for resources in each of these stages to be purely isolated from the other stages.
Within the core logic of the Chalice application, there is nothing guaranteeing that the AWS resource you are trying to access exists. The user needs to be extra careful to make sure the out of band resources they deploy match up with what is used in the Chalice application.

Therefore, it is a much more friendly user story if Chalice handles the deployment of these resources for the user. Also since Chalice did the deployment, it can easily provide references to those deployed resources from within the Chalice application.

Specification

This section will go into detail about the interfaces for adding these managed resources, code samples of how users will interact with the interface, and the deployment logic in deploying these managed resources. Since DynamoDB tables is the only resource this proposal is suggesting to add, this section will be specific to DynamoDB tables.

To have Chalice manage AWS resources, users will first have to declare their resources in code via resources.py file and then may configure these resources using the Chalice config file.

Code Interface

The top level interface into these managed resources is a resources.py file. This is used for declaring all additional AWS resources to be managed by managed by Chalice in an application. The resources.py file will live alongside the app.py file in a Chalice application:

myapp$ tree .
.
|-- app.py
|-- requirements.txt
|-- resources.py

Inside of the resources.py is where the various managed AWS resources are declared and registered to the application. To better explain how the resources.py file works, here is an example of the contents of the file:

from chalice.resources.dynamodb import Table

def register_resources(app):
    app.resource(MyTable)

class MyTable(Table):
    name = 'mytable'
    key_schema = [
        {
            'AttributeName': 'username',
            'KeyType': 'HASH'
        },
        {
            'AttributeName': 'rank',
            'KeyType': 'RANGE'
        }
    ]
    attribute_definitions = [
        {
            'AttributeName': 'username',
            'AttributeType': 'S'
        },
        {
            'AttributeName': 'rank',
            'AttributeType': 'N'
        }
    ]
    provisioned_throughput = {
        'ReadCapacityUnits': 20,
        'WriteCapacityUnits': 10
    }

The resources.py file requires a module level register_resource() function to include any additional resources for Chalice to manage for the application. The register_resources() function only accepts an app object representing the Chalice application. Within the register_reources() function, users must use the app.resource() method to include the resource in the application. Currently, the app.resource() will only allow one argument being the resource class to be registered. Furthermore, all resources registered must have a unique logical name. The logical name for a Chalice resource is the either the class name of the resource or the value of the name property of a Chalice resource class.

To actually declare a managed resource, users must first import the appropriate resource class from the chalice.resources.<service-name> module. Then they must subclass from the desired resource class and provide the appropriate class properties to configure the resource.

In the original example, the user first imports the Table class to use to declare a DynamoDB table for their Chalice application. The user then creates a new class MyTable from the Table class to flush out the properties of the DynamoDB table they want. As it relates to the configurable class properties of a DynamoDB table, they are as follows:

name: Is the name of the logical name of the table in their application. It is important to note that the name of the DynamoDB table will not actually match this value in order to support stages. If this value is not provided, the name of the class will be used as the logical name for the table in the application. In general, all resource classes must allow users to set the name property.
key_schema: The KeySchema parameter to DynamoDB CreateTable API. This defines what the hash key and potential a range key is for the table. This value is required.
attribute_definitions: The AttributeDefinitions paramter to DynamoDB CreateTable API. This defines the types of the specified keys. This value is required.
provisioned_throughput: The ProvisionedThroughput parameter to DynamoDB CreateTable API. This defines the read and write capacity for a DynamoDB table. This value is required.

With the resources.py fully flushed out Chalice will then deploy all of the resources registered to the application in the register_resources() function.

The resources then can be accessed from within the Chalice application. With the addition of the resources.py file, the chalice.Chalice app object will be updated to include a resources property.

class Chalice(object):
    ...
    self.resources = Resources()

The resources property serves as a way of referencing values for deployed resources.

The Resources() class interface will be the following:

class Resources(object):
    def get_service(self, resource_name):
        # type: (str) -> str

    def get_resource_type(self, resource_name):
        # type: (str) -> str

    def get_deployed_values(self, resource_name):
        # type: (str) -> Dict[str, Any]

For the Resources class, its methods are the following:

get_service() - Returns the name of the service the resource falls under. The service name should match the service name used in boto3. For example, a DynamoDB table will return a value of dynamodb.
get_resource_type() - Returns the type of the resource. This should match the name of resource class under chalice.resource.<service> module, which should match the name of the boto3 resource (assuming there is a boto3 resource available for this AWS resource type). For example, a DynamoDB table will return the a value of Table.
get_deployed_values() - Returns a dictionary of the deployed values of a resource. The deployed values are typically identifiers for the resource. The key names in this dictionary should match the parameters a user would typically use in a boto3 client call for that service's API. For a DynamoDB table, the deployed values dictionary will be the following: {"TableName": "<name-of-deployed-table>"}

To interact the with the deployed resources in the application, refer to the previous resources.py and the following app.py:

from chalice import Chalice
import boto3

app = Chalice(app_name='myapp')
dynamodb = boto3.resource('dynamodb')

@app.route('/users/{username}')
def get_user(username):
    deployed_table_name = app.resources.get_deployed_values(
        'mytable')['TableName']
    table = dynamodb.Table(deployed_table_name)
    response = table.get_item(Key={'username': username})
    return response['Item']

In the above example, the application was able to retrieve the name of the deployed DynamoDB table by calling the get_deployed_values() method.

Furthermore, if a user wants to programatically create a client or resource object for a particular deployed resource. Respectively, users could write the following helper functions:

def get_boto3_client(resource_name):
    return boto3.client(app.resources.get_service_name(resource_name))

def get_boto3_resource(resource_name, *resource_identifiers):
    return getattr(
        boto3.resource(app.resources.get_service_name(resource_name)),
        app.resources.get_resource_type(resource_name))(*resource_identifiers)

Config Interface

In the case a user wants the configuration options to vary by stage, users can specify configuration of the managed resources through the Chalice config file. To configure a resource, the user would need to specify the following general configuration:

"resources": {
  "<logical-resource-name>": {
    "<option-name>": "<option-value>"
  }
}

As it relates to DynamoDB tables, the only options available will be configuring the provisioned capacity. Below is a sample configuration for the previously provided DynamoDB table in the resources.py:

"resources": {
  "mytable": {
    "provisioned_throughput": {
       "ReadCapacityUnits": 50,
       "WriteCapacityUnits": 10
    }
  }
}

For DynamoDB tables, only read capacity and write capacity can be specified.

The "resources" configuration can be specified at a top level key and per stage basis where the values in the stage completely replace any that exist in the top level key configuration. In addition, any resource specific configuration provided in the config file will replace whatever values that were specified in code.

For example, take the following defined table in the resource.py file:

from chalice.resources.dynamodb import Table

def register_resources(app):
    app.resource(MyTable)

class MyTable(Table):
    name = 'mytable'
    key_schema = [
        {
            'AttributeName': 'username',
            'KeyType': 'HASH'
        },
    ]
    attribute_definitions = [
        {
            'AttributeName': 'username',
            'AttributeType': 'S'
        },
    ]
    provisioned_throughput = {
        'ReadCapacityUnits': 5,
        'WriteCapacityUnits': 5
    }

With the following Chalice config file:

{
  "version": "2.0",
  "app_name": "myapp",
  "resources": {
    "mytable": {
      "provisioned_throughput": {
        "ReadCapacityUnits": 10,
        "WriteCapacityUnits": 10
      }
    }
  },
  "stages": {
    "dev": {},
    "prod": {
      "resources": {
        "mytable": {
          "provisioned_throughput": {
            "ReadCapacityUnits": 100,
            "WriteCapacityUnits": 20
          }
        }
      }
    }
  }
}

With this Chalice config file, the mytable table for each stage will have the following configuration values:

"dev": Read capacity of 10 and write capacity of 10. Both values are sourced from the top level configuration.
"prod": Read capacity of 100 and write capacity of 20. Both values are sourced from the "prod" stage configuration.

However if there is was no top-level "resources" key in the config file, the dev stage will use the values specified in the resources.py file, which were read capacity of 5 and write capacity of 5.

Deployment Logic

In terms of deployment logic, both chalice deploy and chalice package will be supported.

When it comes to the chalice deploy command, Chalice will look at all of the resources created under the Chalice.resources property and individually deploy and make any updates to the resource using the service's API directly. It is important to note that if the user changes the logical Chalice name of the resource, it will be deleted on Chalice redeploys.

Once deployed, it will save all of the deployed resources under the "resources" key in the deployed.json whose value will be a dictionary that contains values related to the various managed resources. The format of the dictionary will be as follows:

{
    "<logical-resource-name>": {
       "service": "<service-name>",
       "resource_type: "<resource-type>",
       "properties": {
           ... various idenitfiers and properties of the resource...
       }
    }
}

As it relates to the specific keys:

The top level key is the name of the resource that got registered to the application.
The "service" key is the name of the service module the resource falls under. In general the service module, should match the name used to instantiate a botocore client.
The "resource_type" key is the name of the resource class. In general, the name of the resource class should match the name of the class used by the boto3 resource.
The "properties" key contains identifying values related to the resource. The keys and values should match up with the values used in the botocore client calls.

Taking the previous Table example, the value of the "resources" key will look like the following in the deployed.json:

{
    "mytable": {
        "service": "dynamodb",
        "resource_type": "Table",
        "properties": {
            "TableName": "myapp-dev-mytable"
        }
}

Making the entire deployed.json look like the following:

{
  "dev": {
    "api_handler_name": "chalice-trivia-dev",
    "api_handler_arn": "arn:aws:lambda:us-west-2:934212987125:function:myapp-dev",
    "resources": {
      "mytable": {
        "service": "dynamodb",
        "resource_type": "Table",
        "properties": {
            "TableName": "myapp-dev-mytable"
         }
      }
    },
    "lambda_functions": {},
    "backend": "api",
    "chalice_version": "1.0.0b1",
    "rest_api_id": "448qxrx2vj",
    "api_gateway_stage": "dev",
    "region": "us-west-2"
  }
}

For the chalice package command, it will take the resources in the application and add it to the CloudFromation template. The generated CloudFormation template will use the AWS::DynamoDB::Table resource type to create the DynamoDB resource.

It is also important to note the actual name of the DynamoDB table that will be created for both deployment methods will be "<app-name>-<stage-name>-<logical-table-name>". So if in the application, the user adds a table to the application "myapp" called "mytable", the deployed table will be called "myapp-dev-mytable-dev" when deployed to the "dev" stage.

Rationale/FAQ

Q: How do you imagine the interface will grow for future resources?

A lot of the future managed resources will be able to follow the same pattern of the DynamoDB table resource. In general to add support for a new resource, the following changes will be needed:

Add a new service module under chalice.resources if needed and add a base class for that resource for users to subclass from.
Add the necessary chalice deploy and chalice package logic for the resource
Allow for any necessary configurations in Chalice config

To get a better understanding of potential future interfaces, here are some rough sketches for future AWS resouces.

S3 Bucket

Here is some sample applications on how a user may rely on Chalice to manage and interact with their S3 bucket:

The resources.py file would be the following:

# Example of making thumbnails from a source bucket:
# http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
from chalice.resources.s3 import Bucket

def register_resources(app):
    app.resource(SourceBucket)
    app.resource(TargetBucket)

class SourceBucket(Bucket)
    pass

class TargetBucket(Bucket)
    pass

Then the app.py would be the following:

# Example of making thumbnails from a source bucket:
# http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
import io

from chalice import Chalice
import boto3

app = Chalice(app_name='myapp')

s3 = boto3.resource('s3')
source_bucket = s3.Bucket(
    app.resources.get_deployed_values('SourceBucket')['Bucket'])
target_bucket = s3.Bucket(
    app.resources.get_deployed_values('TargetBucket')['Bucket'])

# This is just an example of how an S3 event may look in the future.
# There is no guarantees on this interface.
@app.s3_event(source_bucket.name, event_type='ObjectCreated')
def save_thumbnail(event, context):
    image_stream = io.BytesIO()
    key = event['Records'][0]['s3']['object']['key']
    source_bucket.download_fileobj(key, image_stream)
    resized_image_stream = resize_image(stream)
    target_bucket.upload_fileobj(resized_image_stream, key)

SNS Topic

Here is a sample use of managing and interacting with an SNS topic:

The resources.py would be:

from chalice.resources.sns import Topic

def register_resources(app):
    app.resource(MyTopic)

class MyTopic(Topic)
    pass

And then in the app.py, users could publish messages to this SNS topic:

import json

from chalice import Chalice
import boto3

app = Chalice(app_name='myapp')
sns = boto3.client('sns')

@app.lambda_function()
def publish(event, context):
    arn = app.resources.get_deployed_values('MyTopic')['TopicArn']
    # Publish the message provided to the route.
    sns.publish(TopicArn=arn, Message=json.dumps(event))

Q: Why have users to specify the resources in code (instead of a config file)?

It is a much more intuitive and user friendly interface. The other option would be the user specifies it in some configuration file and the resource would be automatically created and can start being used in the lambda function. So something like:

from chalice import Chalice

import boto3

app = Chalice(app_name='myapp')
dynamodb = boto3.resource('dynamodb')

@app.route('/users/{username}')
def get_user(username):
    # Note: There is no code that actually adds the resource
    table = app.resources.get_deployed_values('mytable')
    response = table.get_item(Key={'username': username})
    return response['Item']

The problems with this approach are the following:

Python developers typically prefer to be working in Python code as opposed to JSON or YAML. Also, Python is easier to write, validate, and extend.
It just seems too implicit that the dynamodb table will be automatically available with no explicit actions in the code that created the table, especially if the resource is a core part of their application logic.

Q: Why separate the resources into a resources.py file?

The main reason is that from a user's perspective it adds a nice layer of separation from the core logic in the app.py and the additional AWS resources they may require. Putting all of the declaration of resources in the app.py file makes the app.py file bloated especially if the user has a lot of resources. Furthermore, the resource decalartion is only really needed for the deployment of the application, thus it does not makes sense to have these classes in the runtime when the classes are not going to be used directly by the application's core logic.

Q: Why have specific classes for each resource type instead of a general Resource class?

Both the resources.py and the chalice.resources package will not be included in the deployment package. Then since the resources are not included in the deployment package, the number of resources is not constrained by deployment performance or Lambda package size. By having a specific class for each resource, it allows for:

Better readability and code completion for users as there is a specific base class for each resource that they can import.
Better validation. Instead of relying on validation of the declared classes in the deployer, we can validate declared classes through specific metaclasses for that resource class.
Better extensibility. In general, not being coupled to a single Resource class allows us to add specific functionality for a resource if needed.

Q: Why have users subclass from a resource class and then define class properties instead of having them instantiate the class directly?

This was chosen for a couple of reasons:

It follows the declarative style of writing a Chalice application. Instantiating an instance of a resource class is a more imperative style.
It allows users to leverage inheritance when declaring resources. For example, they would be able to create a Table resource class with a default provisioned throughput that can be subclassed by any other table class and inherit the provisioned throughput configurations.

Q: Why can't resources managed by a Chalice application share the same logical name in a Chalice application?

It is a combination of making it easier to interact with the resources declared in the resources.py and there being a strong reason for wanting the ability to share the same logical name in Chalice. Specifically:

If resources could share the same name, the app.resources methods would require the service name and resource type to be specified along with the resource name to get the deployed values. The Chalice config file would also require another level of nesting.
Given Chalice resources are declared by defining classes, resource classes in general cannot share the same name as it may clobber a previously declared class in the resources.py file. The only way to make the logical name the same would be by setting the name property of two declared resource classes to be the same.
Currently, users are unable to explicitly set the name of the the resources that Chalice deploys on their behalf. This is because with the existance of stages, deployed resources already need to have different names so that resources can be partitioned by stage and there are no shared resources between stages.
AWS resources of the same type generally cannot share the same name. So sharing the same name would only be for sharing the same name across resource types. However to reiteratte, the exact name of the deployed resource cannot be explicitly set by the user.

Q: What if users require further configuration (i.e. secondary indexes)?

That would not be currently supported. We would need to expose deployment hooks or add the class property to the base class. However it may be possible in the future for users to define their own resource classes and register their custom resources to their application.

Future Work

This section talks about ideas that could be potentially pursued in the future but will not be addressed in this initial implementation.

Custom Resource Classes

This idea would enable users to define their own resources that can be managed by Chalice. The purpose of allowing this would be if a user wants Chalice to manage a resource that currently does not have first class support or maybe there is additional logic they want to add on an existing resource type. In order to support this, the general resource interface will need to be solidified and figure out how users would be able to plumb in deployment logic for that resource.

Simplified Resource Classes

This idea would allow users to specify resources that have a simplified configuration. The purpose of adding these is to help users that are either new to AWS or users that do not necessarily need all of the different resource parameters. This is ultimately done by reducing and simplifying the configuration parameters a user would have to specify in a class. Potential resource classes include: the serverless SimpleTable and an S3 bucket (if an S3 bucket resource gets exposed) that exposes configuration parameter solely for the purpose of hosting content for a website.

aalvrz commented 6 years ago

This looks awesome. Any idea when this might be supported?

jamesls commented 6 years ago

This is pending the work on https://github.com/aws/chalice/issues/604, which essentially makes the deployer code less API Gateway/Lambda specific. This is going to make it easier to support new resource types. I'm actively working on #604 but don't have a concrete ETA.

aalvrz commented 6 years ago

Now that the new deployer has been finished and merged into master, what would be the starting point to begin adding functionalities described in this issue? I would love to start contributing on being able to add necessary resources for a Chalice app.

kadrach commented 6 years ago

SQS triggers are showing up in what I assume is a soft-release in the console and in botocore, hopefully we can use these in Chalice soon! :)

vbloise3 commented 5 years ago

When do you think the functionality described in this issue will be completed? Would love to be able to manage my s3 buckets and dynamodb tables within my chalice code.

kyleknap commented 5 years ago

We still don't have an ETA on when the implementation will be complete. I have done some work to get a rough POC together, but I also made changes to the design that I originally wrote. So I would also like to get a draft together of those proposed changes before getting a formal PR. However because we have support for experimental features in chalice, it should be a lot easier to add as this would likely be an experimental feature.

chkothe commented 5 years ago

Could one have some sort of simpler intermediate solution to address the points raised in the Motivation (one that still makes sense / provides value after this feature is done)? E.g., allowing the user to specify a cloudformation include file that chalice package would automatically pull in? Or does it make sense to allow chalice to generate parametric stacks that could be nested in some bigger stack that has extra resources like the DB etc?

I haven't used either of those CF features, but maybe there's some way to get low-level, but full, coverage across other AWS resources types that way -- if nothing else to get everything deployed/updated together with 1 or 2 CLI commands, but perhaps even with some uni- or bidirectional ability to reference by name between chalice and non-chalice resources.

aws / chalice

[proposal] AWS Managed Resouces for Chalice #516