aws / chalice

Python Serverless Microframework for AWS
Apache License 2.0
10.53k stars 1.01k forks source link

[Proposal] Modeling #1224

Open stealthycoin opened 4 years ago

stealthycoin commented 4 years ago

Proof of concept: #1223

Proposed interface

Two additional keys on the route decorator. input_model and output_model (maybe request/response instead?) The value of these is either None (default) meaning no models are applied to this route. Or an instance of ModelConfig. Which is defined as such

class ModelConfig(object):
    def __init__(self, model, validate=False, cls=None):
        self.model = model
        self.validate = validate
        self.cls = cls
    ...

Where model is a marshmallow schema (currently class, though I plan to change it to instance), validate is whether or not the model should be enforced, and cls is a concrete data type that can be deserialized from the input body, or serialized to a response body.

The request object has a new property model_body which will be an instance of cls created from deserializing the json_body. If the input_model cls is not set then it will raise an error.

An instance of the cls on the output_model can be returned from a view function and will be automatically serialized using the schema.

Validate works slightly differently for in put and output. For input it will create a body validator in API gateway and enforce validation at the API Gateway layer. Invalid requests will not trigger a lambda and will return a 400. For output API gateway does not provide any validation (understandably), so we provide it ourselves using the marshmallow schema. Validate will be manually called, and if any errors are discovered an error will be raised and the validation errors will be logged. A 500 will be returned to the user.

Sample Project

$ tree .
.
├── app.py
├── chalicelib
│   ├── __init__.py
│   └── models.py
└── requirements.txt

1 directory, 4 files

app.py

from chalice import Chalice
from chalice import ModelConfig

from chalicelib import models

app = Chalice(app_name='model-test')

@app.route('/hi', methods=['POST'],
           input_model=ModelConfig(
               models.UserSchema, validate=True, cls=models.User))
def say_hi():
    user = app.current_request.model_body
    return user.make_greeting()

@app.route('/get-user', methods=['POST'],
           output_model=ModelConfig(
               models.UserSchema, cls=models.User))
def get_user():
    totally_a_user_i_got_from_a_db = models.User(
        first_name='William',
        middle_name='Henry',
        last_name='Harrison',
    )
    return totally_a_user_i_got_from_a_db

chalicelib/models.py

import random

from marshmallow import Schema, fields

class UserSchema(Schema):
    first_name = fields.String(required=True)
    middle_name = fields.String()
    last_name = fields.String(required=True)

class User(object):
    def __init__(self, first_name, last_name, middle_name=None):
        self.first_name = first_name
        self.middle_name = middle_name
        self.last_name = last_name

    def make_greeting(self):
        greeting = random.choice(['Hi', 'Hello', 'Sup'])
        if self.middle_name is None:
            return '%s %s %s' % (greeting, self.first_name, self.last_name)
        return '%s %s %s %s' % (
            greeting, self.first_name, self.middle_name, self.last_name)

Sample Calls

first middle and last

$ echo '{"first_name": "george", "middle_name": "wilbur", "last_name": "francis"}' | http POST $(chalice url)hi
HTTP/1.1 200 OK

Hello george wilbur francis

first and last

$ echo '{"first_name": "george", "last_name": "francis"}' | http POST $(chalice url)hi
HTTP/1.1 200 OK

Hello george francis

just first

$ echo '{"first_name": "george"}' | http POST $(chalice url)hi
HTTP/1.1 400 Bad Request

{
    "message": "Invalid request body"
}

example showing output serialization of User object

$ http POST $(chalice url)get-user
HTTP/1.1 200 OK

{
    "first_name": "William",
    "last_name": "Harrison",
    "middle_name": "Henry"
}
stealthycoin commented 4 years ago

Responding to @jamesls comments in the PoC PR here.

The main sticking point for me is going to be whether or not to pull in a third party library for this. On the one hand, marshmallow is going to provide more capabilities than what we'd be able to implement for the immediate future, but on the other hand I'm not sure how much of that we really need.

I'd really prefer to keep the marshmallow dependency. I think we can widen the version range. I'd like to at least get the option to have all the validation errors returned. Possibly just get rid of the API Gateway integration in favor of the marshmallow validate methods. I don't really want us to be maintaining all of that. Like I mentioned in my PoC I want to change the argument from just the Schmea class to an instance of the schema as the first argument of ModelConfig. That would give us the ability to write routes like this.

@route('/user', methods=['GET'], ModelConfig(UserSchema(only='id'))
def get_user():
    return User(storage.get_user_by_id(app.current_request.body['id']))

What do you think about the decorator syntax in #1125? I feel like that version reads a bit easier, but it also doesn't have the flexibility proposed here.

Personally I don't want to deal with jsonschema at all, so I don't like that interface much. I also don't like stacking decorators but that's my own personal preference. If say we want to stop using jsonschema and just have the models validated by the lambda runtime/marshmallow (or something else) then there are no changes to the interface. We just let the function trigger and call validate on the input.

We can also use metadata on the Schema objects if we want to pull data from headers/uri as well as the body. That can be all done transparently more or less with this interface, it would just be interpreting metadata on the Schema that otherwise wasn't used.

I suppose if other people like more decorators more than parameters then we could have an

@app.route.input_model(InputSchema, validate=True)
@app.route.output_model(OutputSchema)
@app.route('/foo')
def route():
    pass

Not a fan of the model_body. Not sure if this is possible, but it seems like this would make it hard to work with IDE auto completion or type checking in general without an explicit cast of model_body. I think it's worth considering separating out the schema/modeling API from the actual serialization API.

While I get that you may want to decouple your serialization from your model (and that should always be possible), the simple case seems verbose in that you have to define your model as well as your schema. In your examples, the same set of fields are duplicated. I think we should have a way that consolidates the two things for the simple case where you don't need them decoupled.

Agreed. I don't like having both a User and UserSchema object also. I tried to think of some ways of consolidating, but as long as I'm using marshmallow I don't want your actual data objects to be instances of the schema. Too many properties might collide, and private methods. The other methods I could think off were to complicated for a PoC. Either code generation, or having a chalice defined model class from which you can derive a marshmallow schema, and can be used as a data object. Essentially just your data class with some fields defined that can be pulled off into a schema subclass at runtime and plugged into the corresponding route decorator arguments. Seemed a bit complicated.

My other thought was to throw away model_body and serialization entirely and deal with it later/separately. That may result in a worse story when we want to have an easy unified way of validation/serialization like you point out should be possible. I also think it will be tedious to type out validate=True all over. Lets say you want it globally enabled, that should be something you can do as well without making some fancy partially applied class constructor.

Another thing we have to consider is how to accommodate people that are already using marshmallow in their apps, especially since we don't provide any support for request validation/marshaling. Looks like library is on v3, but we have to use v2 (perhaps this can change). If someone's currently using the latest major version v3 in their app (a reasonable thing to do) those users wouldn't be able to upgrade chalice.

Two ideas for this.

1) One ditch the api gateway validator entirely. Use marshmallow to validate on input then we don't need json schema at all and can loosen our requirement on marshmallow. The fact that we can't control the spec version that API Gateway uses might be an argument for this. Downside is you are now paying for lambda time. My initial version had ModelConfig(validate=bool, api_gate_validate=bool) where you could do one or both methods. This was overly complex and also not symmetrical since there is out output validation for api gateway. I would have needed two objects InputModelConfig and OutputModelConfig and I thought that was too complex for an initial draft.

2) Implement the JSON schema generator ourself. Thats not the hard part. I'm fine taking a dependency on marshmallow itself just because it has a lot of stuff built in already, that we may want to take advantage of. But the json schema thing I used as a crutch to get the PoC done more quickly. That part should be pretty easy to implement in a v2/v3 agnostic way.

stealthycoin commented 4 years ago

Another nice thing about buying into the marshmallow ecosystem is we can just depend on existing libs to solve serialization/deserialization. For example if you want to provide automatic serialization and deserialization you can rely on marshmallow-dataclass which lets us write code like this without any further changes to Chalice

from chalice import Chalice, ModelConfig

from dataclasses import field
from marshmallow_dataclass import dataclass

app = Chalice(app_name='modelboy')
app.debug = True

@dataclass
class CreateUserInput(object):
    username: str = field(metadata={'required': True})
    password: str = field(metadata={'required': True})
    description: str = field(metadata={'required': True})

@dataclass
class CreateUserOutput(object):
    message: str = field(metadata={'required': True})

    @classmethod
    def from_username(cls, username):
        return cls(f"Successfully created username '{username}'")

@app.route('/', methods=['POST'],
          input_model=ModelConfig(CreateUserInput.Schema(), validate=True),
          output_model=ModelConfig(CreateUserOutput.Schema(), validate=True))
def index():
    data = app.current_request.json_body
    # data is an instance of CreateUserInput
    # write to database
    return CreateUserOutput.from_username(data.username)

By default if we were just using marshmallow we would get a validated dictionary back, by adding dataclasses we can get automatic object serialization/deserialization. I think this helps out enough with the default case of just providing a schema and getting validated JSON, that both should be included in a tutorial section.

The remaining issue is that the validation overlaps with the JSON schema validation that API gateway provides so it might be useful to separate it out into 2 flags in the ModelConfig initializer. This means there is a useless flag on the config if its used as an output validator since api gateway does not have response validation. It also clips off useful messages and returns a generic one.

ModelConfig(CreateUserInput.Schema(), validate=True, api_gateway_validate=True)

http POST $(chalice url) username=stealthycoin password=pass
HTTP/1.1 400 Bad Request
{
    "message": "Invalid request body"
}

ModelConfig(CreateUserInput.Schema(), validate=True, api_gateway_validate=False)

HTTP/1.1 400 Bad Request
{
    "Code": "BadRequestError",
    "Message": "BadRequestError: description: Missing data for required field."
}

The output validation can give useful information when debugging such as in this example where I return an empty string instead of the response object:

TypeError: {
  "message": [
    "Missing data for required field."
  ]
}
dmulter commented 4 years ago

Good stuff. Also might make sense to consider Swagger/OpenAPI model generation for input/output API models. Could implement this in a phase 2, but might be a good idea to consider in the design of a phase 1.

ricky-sb commented 4 years ago

I agree with you, we should just get rid of API Gateway validation and not deal with that dependency. Also, I would be more in favor of stacking decorators. Stuffing arguments into a single decorator can get a bit messy.

What's the status on this @stealthycoin?

jsfehler commented 4 years ago

@stealthycoin Having had an immediate need for exposing documentation, I wrote a library that monkey patches chalice to add models. https://pypi.org/project/leangle/

It's a bit different from your implementation. It lacks validation and stacks decorators, but uses marshmallow in a similar way. Hope you find it interesting as a view on another way of getting similar results.

My biggest takeaway was that I'd rather not be monkey patching chalice if possible. If SwaggerGenerator.generate_swagger() exposed a way to hook in, I would be more protected from changes.

saivarunk commented 3 years ago

Any update on this proposal?

skwashd commented 3 years ago

Yes, please. This would simplify some of our existing logic. Marshmallow is great, but have you considered pydantic? For python 3.7+ it has no external dependencies, earlier versions depend on the dataclasses package.

jakemwood commented 2 years ago

Really looking forward to seeing progress on this issue, especially with Goblet supporting modeling natively. If there has been an API agreed upon, I'd be happy to help contribute a solution.

hbock-42 commented 7 months ago

Hello, any plan on working on this issue? Being able to auto générate documentation natively would be very useful for nearly everyone.

sagardevl commented 5 months ago

Hi, Can we hope this will be added soon.