Closed kgriffs closed 7 years ago
Consider doing this in Talons instead...
In the duplicate issue #42, @lichray has a good comment. Reproducing here since #42 has been closed:
There is one question before this: do we support non-UTF-8 JSON? According to rfc4627, UTF-16/32 LE/BE JSON are valid. If we support them, then we need to handle encoding in auto serialization/(denationalization, possibly). I know the requests library does it, FYI.
Might also be a good idea to consider integration with a library such as http://marshmallow.readthedocs.org/en/latest/ -- by integration here, I probably mean a 'plugin' sort of like a talon, to facilitate integration. It would help provide functionality similar to that of django-rest-framework and django-tastypie (both of which I have used extensively). Just an idea!
I currently write before/after hooks that validate the "content_type" and do the serialization/deserialization for me. In my particular use case, I actually use both XML and JSON. I setup these hooks at the API level, rather than at the Route/View level.
Great framework btw! I jumped into it yesterday and I've already written three small services, two of which are in pre-production and working great.
Adding comment from @sebasmagri from duplicate issue #324:
I'd like to think about a Content Type handlers approach which would allow devs to get and return native objects without worrying about the content type in responders, and implement generic deserializers and serializers for each content type they want to be able to handle.
I still believe the Content Type Handler approach would make sense as in:
Upon these two premises we could add more features like being able to use a content-type handler globally or for specific endpoints (like for image processing or custom chat protocols), probably by inheritance or reuse of some logic from middlewares.
See also this discussion on the ML for more ideas: http://librelist.com/browser//falcon/2015/10/5/base-resource-class-to-implement-jsontranslate-middleware-functionality/
I like the idea of being able to register Content-Type
handlers (with JSON available out of the box). But I am reluctant to overload existing Request
and Response
properties since this may lead to some confusion in practice. One suggestion from @BenjamenMeyer on the mailing list was to add additional well-named properties, such as 'req.json' to expose the serialization mechanism. But how would we extend this model to arbitrary Internet media types per @sebasmagri's suggestion? Perhaps we might add a generic attribute to the Request
and Response
classes, e.g.: req.media
and resp.media
.
One concern I have with this last approach is that req.media
would consume the input stream. We would need to decide if that is OK or if it would lead to violating the Principle of Least Surprise (this was problematic in the past with URL-encoded form POSTs).
@kgriffs per the req.json
and resp.json
method...perhaps we could have something that would "register" a handler against the Request and Response objects, for instance:
req = falcon.Request(...)
setattr(req, 'json', json_request_reader)
resp = falcon.Response()
setattr((resp, 'json', json_response_writer)
My only concern here would be figuring out how to handle the property get/set aspects that use the same attribute name to do two different things. Though we could probably avoid that by saying you can only read from the request and write to the response and just leave them as functions, like with Python Requests.
The registration could be handled similarly to adding routes:
self.app.add_media_handler('application/json', json_request_reader, json_request_writer)
One of the things I like about falcon is that it tries tonot do stuff I don't explicitly tell it to do. Not guessing how my input stream should be handled is definitely one of the forces of Falcon (actually the main selling-point for my initial use of falcon). However, I totally agree that this is a common use, and providing some nice serialization tools is probably a must.
I'll need to give this way more though, but here is my initial suggestion.
I suggest adding methods (or properties, maybe?) to Request
and Response
to handle deserialization and serialization (lazy). The methods/properties should be named and documented to explicitly consume and decode the request body, and try to deserialize it according to the Content-Type header. (being strict is important. Starting to guess at content types that conflict with the passed Content-Type header is a total mess and should def. be avoided)
Imo, consuming the input stream is okay as long as it is explicit from the user. If you need the input stream decoded for some use-case, you probably don't need the raw thing anyway. We should probably store the deserialized thing on the request though (which is why a property makes sense to me).
I'm can't think of a better name than you @kgriffs , so I'll stick to media
.
On Request
media
is a read-only property that inspects the Content-Type
header and matches it against some global (probably stored on the API
object) matcher structure (well, the matching should probably be done on the global registry, acting as a router).
media
property and is cached there.media
raises an errorOn Response, same thing, just the other way:
media
is set, try to encode it according to the Accept
header, with some sane fallback logic.Errors in encode/decode should raise and be handled globally with the correct status codes for the content related errors. Most of these errors have specific errors codes.
Most of the responsibilty here is delegated to the encoder/decoder objects we need to be able to register globally somewhere, which I think is the only way to go.
Most users outside of the common deserialize this json - serialize this to json
will probably need to write custom serializers anyway, so I think we should prioritize making adding custom serializers easy.
On a related note: For the input stream, we could add an intermediat body
property that just contains the raw body as read from the stream. That way, when you handle errors you can actually get the body that caused the error.
The issue here is ofc that body-size could be giant, etc, so we once again would have to discuss how to handle this (and allow that handling to be customized as it differs from use-case to use-case what is really the "best" way). A sane default might be: Keep up to 1MB in memory, if the stream grows beyond this, roll over to temp disk storage (maybe using https://docs.python.org/3.4/library/tempfile.html#tempfile.SpooledTemporaryFile)
I'm also up for doing some heavy lifting on this feature.
Alternatively (and maybe more falcony), we could provide all the parts but some assembly required. An example might explain it best:
class MyResource(object):
def on_get(self, req, resp):
req.media # Proxy object for unserializing
resp.media # Proxy object for serializing
req.media.type # Detected from Content-Type (None if unknown)
resp.media.type # Detected from Accept (default type if unkown)
# Ping / Pong example
data = req.media.body # consumes stream, unserializes, result is cached
resp.media.body = data # Like setting resp.body, but serializes it
app = falcon.API()
app.add_route('/ping', MyResource())
# serializers should handle their format in both directions
json = JSONSerializer(max_body=1024*1024)
app.add_serializer('application/json', json)
app.add_serializer('application/*+json', json)
# Advanced edition
class MyStreamable(object):
def on_get(self, req, resp):
# Also note that for some formats, the ability to
# read one "unit" at the time, and write one unit,
# might be desireable.
# This would allow us to easily wrap a normal serializer
# In a format that can be streamed (like having a json serializer
# inside a LineSerializer, or similar)
# Would make Server Sent Events a breeze to implement.
while True:
message = req.media.read()
if not message:
break
resp.media.write(message)
I definitely agree with the Principle of Least Surprise. I also like the transformability I've seen so far in Falcon. Ben Meyer's idea of registering new attributes on the Request/Response objects seems like the most flexible and powerful approach while still making it easy to work with content types in a semi-automatic function with expected magic. As long as there was a good cookbook with some examples for that, I have no problems assembling it.
Some really good thinking going on here and overall with Falcon - thank you all so much - very nice platform.
EDIT: Accidentally posted prematurely and have updated the comment with full message.
In the service I'm developing (JSON only) we're using the extended Request
/Response
approach along with a middleware. I wasn't entirely happy with doing so though, because inheritance is always limited, but I needed somewhere to assign the object to be serialized. Basically, our Response
classes have a "value" field and a middleware then encodes it (along with an envelope):
class SerializationComponent(object):
def process_response(self, request, response, resource):
if hasattr(response, 'value'):
response.body = json.dumps({
'data': response.value
})
I'm not sure why we couldn't just create a middleware that checks content type in process_request()
and process_response()
and encodes the data accordingly. The only problem is where to store the decoded and to-be-encoded values, which would need some sort of new property on Request
and Response
respectively.
I'll post my implementation as well to see if it can further the conversation along. Admittedly, I have no clue if my implementation will scale well into the future or work for others but it does for me & my customers.
I have a middleware object for auto selecting the serializer & deserializer based on a table lookup of content-type matches (deserializer) & accept header preferences (serializer). I then basically proxy the selected Deserializer
& Serializer
objects deserialize()
& serialize()
methods directly on the request & response objects. I sub-class the falcon request/response objects for this reason among others.
The serializer selection looks like this:
class Middleware(object):
""" Serializer middleware object """
def process_resource(self, req, resp, resource):
""" Process the request after routing.
Serializer selection needs a resource to determine which
serializers are allowed.
"""
if resource:
mimetypes = [s.MIMETYPE for s in serializers.SERIALIZERS]
preferred = req.client_prefers(mimetypes)
serializer = serializers.get_serializer(preferred)
if not serializer:
abort(exceptions.RequestNotAcceptable)
elif serializer not in resource.allowed_serializers:
abort(exceptions.SerializerNotAllowed)
else:
resp.serializer = serializer()
I then have a list of allowed_serializers
on my base resource which can be overridden by other resources. Imagine a file upload resource vs standard CRUD resource. In this case the resp.serializer property is actually the instantiated Serializer object that I can then do the following with in my resource:
class Resource(BaseResource):
""" Single item resource & responders """
allowed_deserializers = [deserializers.JSONAPIDeserializer]
def on_get(self, req, resp, uuid):
""" Find the model by id & serialize it back """
model = find(uuid)
resp.last_modified = model.updated
resp.location = url_for_rtype(model.rtype, model.uuid)
resp.serialize(model)
This is nice because I don't have to know the serializer at all in the resource or model. The resp.serialize
method will be called on the already selected serializer without having to know or care. That means each one of my serializers have to either understand the incoming object to be serialized (model in this case).. or what I actually do but cut out of the example is resp.serialize(model.to_rest())
which normalizes the model for any serializer to have.
This includes things like filtering certain fields, pagination, etc so each serializer doesn't have to know how to do that. All the serializer needs to know how to do is take a normalized data structure & ensure it is RFC or some other spec compliant payload including headers & whatever else may need to be done. In the case of JSONAPI it needs to reconstruct the data structure a bit to ensure compliance but that's the role of the serializer in my app.
My motivation for this by the way is to support file uploads, csv, json, & jsonapi depending on the resources that are being accessed. All of my crud resources support csv & jsonapi while some more specialized aggregation or utility interfaces that only my web app may use are json because the data model doesn't fit with the jsonapi spec at all.
How about using auto_parse_json
request option just like auto_parse_form_urlencoded
? When Falcon supports more media types, just add more request options for them.
FWIW, a few add-ons have cropped up to address this in terms of JSON:
I think it would make sense to have something basic that you get out of the box, and then either make it extensible or replaceable by 3rd-party add-ons for advanced use cases.
I see that this feature is constantly pushed to next versions without any agreement on how it could be designed. I was postponing the implementation of content negotiation in my graceful project (see swistakm/graceful#13).
Since I am doing some major redesigns in my project I decided to give this problem a try and experiment a bit with different approaches. Here my thoughts.
I made some assumptions on how the ideal solution would look like from the perspective of my project. Still I tried to keep general idea simple and generic and I think that this perspective would be valuable for other higher-level frameworks like hug
The best if we could tackle three problems at once:
Accept
header.Reason for doing all of this within single solution are pretty obvious. APIs that accept input data and return some data must deal with with both serialisation and deserialisation. Most libraries for handling various data formats provide symmetric interfaces e.g. json.loads() vs. json.dumps()
and yaml.load() vs. yaml.dump()
.
Problem 1. is the simplest to deal with: we have to look at Content-Type
header and if we support that content type we can try to deserialise data. Problems 2. and 3. are strongly linked. Content negotiation should drive the data serialisation.
From perspective of my own project it would be best to have one solution that can be configured once for whole application without the need to specify content negotiation settings for every resource separately. This should make sense for most applications: if your API speaks in some format on one resource endpoint it should speak that format on every served resource.
This of course holds true only for generic formats for structured data like JSON, XML, YAML, MessagePack etc. Sometimes you will have to serve or read something that does not represent structured data (i.e. dictionaries, lists, etc.). Best examples would be serving images or any other binary data: such content still can be negotiated but inputs and outputs are not the general data structures.
It means that even if developer has defined his content type handling mechanisms globally he should use actual content negotiation mechanism explicitly within resource code. Like in @tbug's approach: you can use request.media
but don't have to.
Assumption 2. may suggest that the best place for defining content negotiation is API class. Probably thats true but I believe it is possible to came with solution that can be introduced gradually in order to limit changes in the API. I think that if designed wisely, it could be even polyfilled or backported to older falcon versions. Such approach would be great for higher-level frameworks built on top of falcon. Even if new feature will not be released anytime soon, other projects can vendor part of future code or prepare their own polyfill. The only thing we need is to agree on some interface that will be used in future.
Like always there are some decisions that needs to be done and they are never easy. For instance:
IMO from the framework's perspective the best approach is to simply avoid making such decisions. If content negotiation layer is pluggable user can very easily decide how what to do and how to support extra formats.
Of course we can provide some implementation of chosen content type handler as a reference and to make falcon easier to use by newcomers (JSON seems like the most obvious choice). User can always decide to use his own implementation or to extend/override existing one.
This thread is already long and others proposed some interesting ideas. I have experimented with few of them to see what are their pros and cons.
This is unfortunately completely different type of serialisation. Libraries like marshmallow just translate objects between different domains but do not perform content type serialisation/deserialisation.
Middleware was proposed by few people and it seems like the least invasive approach. They are optional by nature, can be easily extended, and work globally. Still I don't like the idea of monkey-patching the request object like proposed by @smcclstocks. Since we have __dict__
in Request's __slots__
it is now possible but it still seems like a dirty workaround. This may be fine for custom middleware in user's application code but I think that framework should avoid monkey-patching it's own core objects as it creates surprises. Other way around is to leverage context
attribute. On the other hand contexts should be user-defined objects and no one expects to find extra data there.
My main objection against middleware in this situation that middleware in my opinion should be something optional even for application logic. They are great for caching or authorisation. In most cases resources will work no matter if they are registered with or without custom middlewares. This of course is not a case for providing database connections for resources and middleware is a standard way in many frameworks for providing such objects.
Also @smcclstocks's solution assumes that there is some global serializers registry and that would be very problematic. Especially if someone would like to provide it's own serializers registry implementation or some imported package would register it's own serializers without user's knowledge.
As @kgriffs already mentioned it pollutes Request's and Response's namespaces and also does not help in content negotiation with Accept
header at all. Also naming would be bit confusing. For instance: does req.json()
really suggests that function reads JSON string and returns dictionary?
Still idea for registering content type handlers via API object looks like a good idea because stays in line with current interface design.
In my opinion it is almost perfect design from the perspective of falcon user:
req.media
and resp.media
attributes to falcon user. Still, the definition of possible content handlers is global for application (assumption 2.)
The only problem is that it ties directly the initialisation of Response object to the Request object. The resp.media
attribute needs to know the Accept
header value or at least the result req. client_prefers()
. Without that data the Response object cannot resolve content type handler and all this belongs to Request domain. I'm afraid that this cannot be easily changed without breaking backwards compatibility. This is due to possibility to pass custom request and response objects. We should expect that existing users do not expect any extra initialisation arguments. I have an idea how to bypass this problem and I will discuss this later.
I would also not worry that request stream consumption would be a surprise for the user. Note that:
req.media
would be optional.req.media.consume_stream()
) should be enough.In my opinion the @tbug's idea is the good direction. The only problem is this Response initialisation. I have two ideas how to resolve this.
Since the Request object is the only one that knows content type preference of the user agent it could make sense to have this new media
attribute only in Request object. The good point is that such approach can be implemented without any changes in falcon core only through custom request classes:
import json
import yaml
from falcon import Request
from falcon import errors
class JSONHandler:
def read(self, request):
return json.loads(request.stream.read().decode('utf-8'))
def write(self, data, response):
response.body = json.dumps(data)
class YAMLHandler:
def read(self, request):
return yaml.load(request.stream.read().decode('utf-8'))
def write(self, data, response):
response.body = yaml.dump(data)
def request_factory(handlers):
class Media:
def __init__(self, request):
self.content_type = request.content_type
self.client_prefers = request.client_prefers(handlers.keys())
def read_from_request(self, request):
if self.content_type not in handlers:
raise errors.HTTPUnsupportedMediaType
return handlers[self.content_type].read(request)
def write_to_response(self, data, response):
if self.client_prefers is None:
raise errors.HTTPNotAcceptable
handlers[self.client_prefers].write(data, response)
class MediaRequest(Request):
@property
def media(self):
return Media(self)
return MediaRequest
Note that JSONHandler
and YAMLHandler
here are very simple as I assume that their actual implementation is out of scope of this feature.
Configuration even without any dedicated support in the API
class is very simple:
api = application = falcon.API(request_type=request_factory(
{
'application/yaml': YAMLHandler(),
'application/json': JSONHandler(),
}
))
Unfortunately the actual usage is a bit unsatisfying:
class Echo:
def on_post(self, req, resp):
payload = req.media.read_from_request(req)
req.media.write_to_response(resp, payload)
It is very short and simple but does not look intuitive at all. It will work for project like graceful or hug where users almost never work with raw req
& resp
objects. Maybe it could be improved by choosing proper names but will never be as intuitive and expressive as @tbug's original approach.
The backwards incompatibility problem of extended Response
object initialization can be reduced in time by providing the additional function in API
class that would allow to affect how new req and resp instances are initialised:
class API():
...
def create_req_resp(request_type, response_type, env, req_options, resp_options):
req = request_type(env, options=req_options)
resp = response_type(options=resp_options)
return req, resp
...
Then this feature could be introduced gradually over next releases:
The additional advantage of this approach is that improves framework extensibility and maybe introduce a way for some other use cases. Falcon users could experiment with req/resp initialisation and decide over time if @tbug's approach really makes sense. From perspective of higher level frameworks frameworks the first step is simply enough to provide this style of content negotiation.
Also we can go even further and allow to provide req/resp initialisation function as a new API
class keyword argument to avoid the need for subclassing.
@swistakm read through your proposal...quite interesting and very good write-up. Thanks!
One thing I would caution - let's not try to be too Smart about this. One reason I initially got introduced to Falcon was b/c the Pecan + WebOb framework we were using was being too smart and didn't allow us to do what we wanted to do as it had its own ideas, etc on how things should work which were not necessary true in all cases - for instance, we had an API that took in JSON and at times generated JSON back, but at times it should just return a 201/204 Status; Pecan made that impossible - if we told it the API took in JSON we had to spit out a JSON response and a 200 Status - the JSON response ended up being []
. Let's not make that kind of mistake in Falcon.
So I do favor a more decentralized approach of having the deserialization/serialization in the Request/Response objects but leave it to the actual framework users to decide whether or not to use them - whether or not to honor the ACCEPT Header, and whether or not to input of one type necessarily means output of that same time - IOW, Response knows nothing about the Request aside from what the developer tells it.
That said, it should be easy for the Developer to make the interconnection between the Request and Response objects if they wanted to enforce that functionality and be extremely strict about it - but that's a dev's choice, not the framework's choice - and given Falcon's tagline of Falcon is a very fast, very minimal Python web framework
doing too much would probably go against that.
$0.02 on it.
A lot of good ideas on this issue. This is a tough issue to address in a flexible enough way to solve a good percentage of use-cases. Over the next couple weeks, I'm going to be using a lot of the feedback and ideas here to help put together a solution. Keep your eyes out for a PR.
Everyone, please take a look at https://github.com/falconry/falcon/pull/1050 and share your thoughts.
@kgriffs overall I like it; I'd move away from the default value and add the abstract class for the handlers, but it looks promising.
How to do serialization/deserialization is a common question from the community. How can we make this work out of the box?
Things to decide: