Open stealthycoin opened 5 years ago
Responding to @jamesls comments in the PoC PR here.
The main sticking point for me is going to be whether or not to pull in a third party library for this. On the one hand, marshmallow is going to provide more capabilities than what we'd be able to implement for the immediate future, but on the other hand I'm not sure how much of that we really need.
I'd really prefer to keep the marshmallow dependency. I think we can widen the version range. I'd like to at least get the option to have all the validation errors returned. Possibly just get rid of the API Gateway integration in favor of the marshmallow validate
methods. I don't really want us to be maintaining all of that. Like I mentioned in my PoC I want to change the argument from just the Schmea class to an instance of the schema as the first argument of ModelConfig
. That would give us the ability to write routes like this.
@route('/user', methods=['GET'], ModelConfig(UserSchema(only='id'))
def get_user():
return User(storage.get_user_by_id(app.current_request.body['id']))
What do you think about the decorator syntax in #1125? I feel like that version reads a bit easier, but it also doesn't have the flexibility proposed here.
Personally I don't want to deal with jsonschema at all, so I don't like that interface much. I also don't like stacking decorators but that's my own personal preference. If say we want to stop using jsonschema and just have the models validated by the lambda runtime/marshmallow (or something else) then there are no changes to the interface. We just let the function trigger and call validate on the input.
We can also use metadata on the Schema objects if we want to pull data from headers/uri as well as the body. That can be all done transparently more or less with this interface, it would just be interpreting metadata on the Schema that otherwise wasn't used.
I suppose if other people like more decorators more than parameters then we could have an
@app.route.input_model(InputSchema, validate=True)
@app.route.output_model(OutputSchema)
@app.route('/foo')
def route():
pass
Not a fan of the model_body. Not sure if this is possible, but it seems like this would make it hard to work with IDE auto completion or type checking in general without an explicit cast of model_body. I think it's worth considering separating out the schema/modeling API from the actual serialization API.
While I get that you may want to decouple your serialization from your model (and that should always be possible), the simple case seems verbose in that you have to define your model as well as your schema. In your examples, the same set of fields are duplicated. I think we should have a way that consolidates the two things for the simple case where you don't need them decoupled.
Agreed. I don't like having both a User and UserSchema object also. I tried to think of some ways of consolidating, but as long as I'm using marshmallow I don't want your actual data objects to be instances of the schema. Too many properties might collide, and private methods. The other methods I could think off were to complicated for a PoC. Either code generation, or having a chalice defined model class from which you can derive a marshmallow schema, and can be used as a data object. Essentially just your data class with some fields defined that can be pulled off into a schema subclass at runtime and plugged into the corresponding route decorator arguments. Seemed a bit complicated.
My other thought was to throw away model_body
and serialization entirely and deal with it later/separately. That may result in a worse story when we want to have an easy unified way of validation/serialization like you point out should be possible. I also think it will be tedious to type out validate=True
all over. Lets say you want it globally enabled, that should be something you can do as well without making some fancy partially applied class constructor.
Another thing we have to consider is how to accommodate people that are already using marshmallow in their apps, especially since we don't provide any support for request validation/marshaling. Looks like library is on v3, but we have to use v2 (perhaps this can change). If someone's currently using the latest major version v3 in their app (a reasonable thing to do) those users wouldn't be able to upgrade chalice.
Two ideas for this.
1) One ditch the api gateway validator entirely. Use marshmallow to validate on input then we don't need json schema at all and can loosen our requirement on marshmallow. The fact that we can't control the spec version that API Gateway uses might be an argument for this. Downside is you are now paying for lambda time. My initial version had ModelConfig(validate=bool, api_gate_validate=bool)
where you could do one or both methods. This was overly complex and also not symmetrical since there is out output validation for api gateway. I would have needed two objects InputModelConfig
and OutputModelConfig
and I thought that was too complex for an initial draft.
2) Implement the JSON schema generator ourself. Thats not the hard part. I'm fine taking a dependency on marshmallow itself just because it has a lot of stuff built in already, that we may want to take advantage of. But the json schema thing I used as a crutch to get the PoC done more quickly. That part should be pretty easy to implement in a v2/v3 agnostic way.
Another nice thing about buying into the marshmallow ecosystem is we can just depend on existing libs to solve serialization/deserialization. For example if you want to provide automatic serialization and deserialization you can rely on marshmallow-dataclass
which lets us write code like this without any further changes to Chalice
from chalice import Chalice, ModelConfig
from dataclasses import field
from marshmallow_dataclass import dataclass
app = Chalice(app_name='modelboy')
app.debug = True
@dataclass
class CreateUserInput(object):
username: str = field(metadata={'required': True})
password: str = field(metadata={'required': True})
description: str = field(metadata={'required': True})
@dataclass
class CreateUserOutput(object):
message: str = field(metadata={'required': True})
@classmethod
def from_username(cls, username):
return cls(f"Successfully created username '{username}'")
@app.route('/', methods=['POST'],
input_model=ModelConfig(CreateUserInput.Schema(), validate=True),
output_model=ModelConfig(CreateUserOutput.Schema(), validate=True))
def index():
data = app.current_request.json_body
# data is an instance of CreateUserInput
# write to database
return CreateUserOutput.from_username(data.username)
By default if we were just using marshmallow we would get a validated dictionary back, by adding dataclasses we can get automatic object serialization/deserialization. I think this helps out enough with the default case of just providing a schema and getting validated JSON, that both should be included in a tutorial section.
The remaining issue is that the validation overlaps with the JSON schema validation that API gateway provides so it might be useful to separate it out into 2 flags in the ModelConfig
initializer. This means there is a useless flag on the config if its used as an output validator since api gateway does not have response validation. It also clips off useful messages and returns a generic one.
ModelConfig(CreateUserInput.Schema(), validate=True, api_gateway_validate=True)
http POST $(chalice url) username=stealthycoin password=pass
HTTP/1.1 400 Bad Request
{
"message": "Invalid request body"
}
ModelConfig(CreateUserInput.Schema(), validate=True, api_gateway_validate=False)
HTTP/1.1 400 Bad Request
{
"Code": "BadRequestError",
"Message": "BadRequestError: description: Missing data for required field."
}
The output validation can give useful information when debugging such as in this example where I return an empty string instead of the response object:
TypeError: {
"message": [
"Missing data for required field."
]
}
Good stuff. Also might make sense to consider Swagger/OpenAPI model generation for input/output API models. Could implement this in a phase 2, but might be a good idea to consider in the design of a phase 1.
I agree with you, we should just get rid of API Gateway validation and not deal with that dependency. Also, I would be more in favor of stacking decorators. Stuffing arguments into a single decorator can get a bit messy.
What's the status on this @stealthycoin?
@stealthycoin Having had an immediate need for exposing documentation, I wrote a library that monkey patches chalice to add models. https://pypi.org/project/leangle/
It's a bit different from your implementation. It lacks validation and stacks decorators, but uses marshmallow in a similar way. Hope you find it interesting as a view on another way of getting similar results.
My biggest takeaway was that I'd rather not be monkey patching chalice if possible. If SwaggerGenerator.generate_swagger() exposed a way to hook in, I would be more protected from changes.
Any update on this proposal?
Yes, please. This would simplify some of our existing logic. Marshmallow is great, but have you considered pydantic? For python 3.7+ it has no external dependencies, earlier versions depend on the dataclasses package.
Really looking forward to seeing progress on this issue, especially with Goblet supporting modeling natively. If there has been an API agreed upon, I'd be happy to help contribute a solution.
Hello, any plan on working on this issue? Being able to auto générate documentation natively would be very useful for nearly everyone.
Hi, Can we hope this will be added soon.
Proof of concept: #1223
Proposed interface
Two additional keys on the route decorator.
input_model
andoutput_model
(maybe request/response instead?) The value of these is eitherNone
(default) meaning no models are applied to this route. Or an instance ofModelConfig
. Which is defined as suchWhere
model
is a marshmallow schema (currently class, though I plan to change it to instance),validate
is whether or not the model should be enforced, andcls
is a concrete data type that can be deserialized from the input body, or serialized to a response body.The request object has a new property
model_body
which will be an instance ofcls
created from deserializing thejson_body
. If theinput_model
cls
is not set then it will raise an error.An instance of the
cls
on theoutput_model
can be returned from a view function and will be automatically serialized using the schema.Validate works slightly differently for in put and output. For input it will create a body validator in API gateway and enforce validation at the API Gateway layer. Invalid requests will not trigger a lambda and will return a 400. For output API gateway does not provide any validation (understandably), so we provide it ourselves using the marshmallow schema. Validate will be manually called, and if any errors are discovered an error will be raised and the validation errors will be logged. A 500 will be returned to the user.
Sample Project
app.py
chalicelib/models.py
Sample Calls
first middle and last
first and last
just first
example showing output serialization of
User
object