bocadilloproject / bocadillo

(UNMAINTAINED) Fast, scalable and real-time capable web APIs for everyone
https://bocadilloproject.github.io
MIT License
396 stars 41 forks source link

JSON schema validation #8

Closed florimondmanca closed 5 years ago

florimondmanca commented 5 years ago

Users should be able to easily validate JSON contained in the body of incoming requests (e.g. from POST, PUT or PATCH requests).

Two popular ways of doing this in Python seem to be jsonschema and marshmallow.

We could add a concept of "JSON validation backend" — which would rely on jsonschema or marshmallow being installed (Bocadillo wouldn't ship with them by default). We'd be able to select which one to use when creating the API object, e.g. through a json_validation_backend option.

JSON validation could materialize using a decorator on a view, e.g. @api.validate_json(...).

For jsonschema, we'd provide the schema dictionary:

schema = {
    'type' : 'object',
    'properties' : {
        'price' : {'type' : 'number'},
        'name' : {'type' : 'string'},
    },
}

@api.validate_json(schema)
async def create_products(req, res):
    pass

class ProductsView:

    @api.validate_json(schema)
    async def post(req, res):
        pass

For marshmallow, we'd pass a Schema class:

from marshmallow import Schema, fields

class ProductSchema(Schema):
    price = fields.Integer()
    name = fields.Str()

@api.validate_json(ProductSchema)
async def create_products(req, res):
    pass

A possible implementation for validate_json() would be to wrap the view and consume req.json(). (From first investigation, it is possible to call await req.json() multiple times, so consuming JSON should not cause issues if the view function also needs to access it.)

florimondmanca commented 5 years ago

Another option: Pydantic. Type annotation-based validation, a brilliant idea.

iovanom commented 5 years ago

Hi! I can help you with this issue. It's ok for you if I take it?

florimondmanca commented 5 years ago

Hi @iovanom, thanks for suggesting! Please go ahead.

Here are a few things that I'm not sure were clear in the issue description — if something's still not clear after this, please ask. :-)

strongbugman commented 5 years ago

Hi, I have seen this feature in flasgger, so what about providing a swagger extension? And even we can parse and validate all request's data (query string, json body and so on) by user's swagger document

florimondmanca commented 5 years ago

Yes, OpenAPI-based validation is definitely worth investigating. I’ve seen Starlette providing schema generation utilities so I’ll take a closer look at how this may work. Perhaps JSON validation is part of a bigger picture: that of API schemas...

iovanom commented 5 years ago

I've investigated a little and in my point of view we should use the validation model to create the definition of the body for API schema but not vice-versa, to use API schema to validate the body. With pydantic we can use the body like dataclass and I think this is the right way.

florimondmanca commented 5 years ago

@iovanom I'm not sure I understand. 🤔 What do you mean by "body for the API schema"? Do you have an example to illustrate, perhaps with pydantic?

My initial intuition was that the validation backend would use the schema provided to @api.validate_json() to actually perform the validation. Do you mean we should instead have a common "interface" for the object passed to @api.validate_json() which the backend would "translate" in the language of its underlying library?

florimondmanca commented 5 years ago

@iovanom I've come to the conclusion that thinking in terms of standard schema format such as OpenAPI or JSON Schema would be a less "lock-in" idea. Was it what you were thinking about?

For example, ideally we'd want to be able to pass a dictionary in any of these supported formats, i.e.

schema = {
  "title": "Product",
  "description": "This is the description of the Product model",
  "type": "object",
  "properties": {
    "Name": {
      "title": "Name",
      "type": "string"
    }
  }
}

@api.validate(schema, spec="jsonschema")
async def create_product(req, res):
    pass

And then in principle we shouldn't even care whether the user uses Pydantic, Marshmallow or any other library that can convert to said spec (jsonschema in this case).

from pydantic import BaseModel

class Product(BaseModel):
    """This is the description of the Product model."""
    name: str

@api.validate(Product.schema(), spec="jsonschema")
async def create_product(req, res):
    pass

That said, it looks like Marshmallow does not have a way of converting its schemas to a standard spec, so users wouldn't be able to use that. (Or have I misread the docs?)

@strongbugman You mentioned Swagger/OpenAPI. Do you know of any Python library that allows to validate a JSON document against an OpenAPI specification (all as dicts)?

strongbugman commented 5 years ago

@florimondmanca maybe jsonschema worth a try

florimondmanca commented 5 years ago

Hi here, I've submitted #96: extensible JSON validation mechanism with jsonschema built-in. This means Pydantic or any other validation library that can convert to jsonschema is supported. Thoughts? :)

thebigmunch commented 5 years ago

That said, it looks like Marshmallow does not have a way of converting its schemas to a standard spec, so users wouldn't be able to use that. (Or have I misread the docs?)

There's a 3rd-party library that can convert Marshmallow schemas to JSON schemas. And, this from the Marshmallow teamis probably of interest as well.

You mentioned Swagger/OpenAPI. Do you know of any Python library that allows to validate a JSON document against an OpenAPI specification (all as dicts)?

I believe flex does, but it's no longer maintained.

You may also want to check out python-fastjsonschema as an alternative to jsonschema.

florimondmanca commented 5 years ago

Hi @thebigmunch:

florimondmanca commented 5 years ago

@thebigmunch Per your advice, I've added a fastjsonschema backend. The developer will be able to select which jsonschema implementation they prefer by choosing which backend to use. :+1:

strongbugman commented 5 years ago

@florimondmanca Hi, I develop a Starlette extension to handle API document recently, maybe it's interested to integrate with Bocadillo

florimondmanca commented 5 years ago

Thanks for suggesting @strongbugman.

I’m feeling this issue is becoming ill-posed, though. JSON validation/serde is one thing and automatic API docs / schema generation is another thing. I’ll split into those 2 issues to make things clearer. :)

There is some work going on over at Tortoise ORM to give it its own validation/serde utilities, and I’m more and more inclined to consider an official integration so as to use those rather than let people fiddle integrating Pydantic/attrs/marshmallow/etc with Bocadillo.

florimondmanca commented 5 years ago

People over Encode doing it again — typesystem is yet another validation library, whose schemas subclass the Mapping abstract base class, making them easy to work with built-ins. https://github.com/encode/typesystem

florimondmanca commented 5 years ago

232 is bringing TypeSystem as a dependency for route parameter validation. The goal is to have TypeSystem power query parameter validation as well, and then JSON request body validation. I don't think we should allow to integrate with other libraries atm for the sake of simplicity, but I'll be open to that in the future. :-)