Open CaselIT opened 4 years ago
Hi :wave:,
Thanks for using Falcon. The large amount of time and effort needed to maintain the project and develop new features is not sustainable without the generous financial support of community members like you.
Please consider helping us secure the future of the Falcon framework with a one-time or recurring donation.
Thank you for your support!
(Pasting my input from Gitter)
Related community requests:
Personally, the cleanest solution IMHO would be
Request.bounded_stream
to something more fitting (yeah, naming is hard... maybe just Request.body_reader
?)Request
class attribute a-la context_type
BufferedReader
for multipart is largely duplicating the bounded stream, maybe these could be merged and cleaned up while we are at it (?)If we choose this path, maybe we could also show examples how to override the default bounded reader with a no-op passthrough function, so that the raw Request.stream
would be made available to media instead (to shave off a couple of CPU cycles).
Edit: While at a first glance this looks extremely straightforward to implement, one complication is that this would force us to define the minimum stream interface (for both WSGI and ASGI) required for such replacement readers. Which might be a good thing™ in itself though.
Rename Request.bounded_stream to something more fitting (yeah, naming is hard... maybe just Request.reader?)
Not really convinced by reader
, since it looses the "stream" from the name. Other random alternatives reader_stream
| stream_reader
| wrapped_stream
.
Indeed, stream_reader
or body_reader
would probably be most fitting.
After thinking a bit about this, I think that extending the stream is probably the most expandable option.
I think that maybe we could provide a CachedStream
alongside the BoundedStream
.
Regarding where this parameter would go, request_options
is an alternative or maybe like the Context
https://github.com/falconry/falcon/blob/113eea113844058816ed745d8dc9cb197e75cb95/falcon/request.py#L435
?
I wrote request_options
without thinking too much tbh. But it might indeed make more sense to follow the context_type
pattern, and define something like body_reader_type
.
body_reader_type
I still think stream
should be in the name tbh.
I stumbled on this limitation while developing an app that required me to store the raw request body it receives (falcon==2.0.0
).
The workaround I found was to develop a middleware
class that "adds" the property request.body
.
However, this approach renders request.media
unusable and impacts the long term maintenance of the application, a developer already familiar with the falcon
framework would not be able to straightforwardly use part of the framework, and will find a property being used that is not documented globally.
Also it forces me to connect to a data store in the middleware
which I really do not like.
It is sometimes useful to have the ability to use both the raw stream and the media in a request.
Is it access the raw stream really the use case? I might have tunnel vision on my particular use case but what I felt missing is the access to the request body. If we phrase it like that, feels natural to have a request.body
property.
I share my suggestion below (as a proof of concept as I didn't run it).
# Request class
def __handle_request_stream(self):
handler = self.options.media_handlers.find_by_media_type(
self.content_type,
self.options.default_media_type
)
try:
self._body = self.bounded_stream.read()
self._media = handler.deserialize(
BytesIO(self._body),
self.content_type,
self.content_length
)
finally:
if handler.exhaust_stream:
self.bounded_stream.exhaust()
def get_media(self):
if self._media is not None or self.bounded_stream.eof:
return self._media
self.__handle_request_stream()
return self._media
media = property(get_media)
def get_body(self):
if self._body is not None or self.bounded_stream.eof:
return self._body
self.__handle_request_stream()
return self._body
body = property(get_body)
I can contribute on this topic if help is accepted.
It is sometimes useful to have the ability to use both the raw stream and the media in a request.
Is it access the raw stream really the use case? I might have tunnel vision on my particular use case but what I felt missing is the access to the request body.
The goal is similar in allowing multiple accesses to the body. But I think that allowing a way to override how the stream behaves has the least impact on the behaviour of falcon. Meaning that you could retain the current logic as default, ie the stream is not cached, but also allow different behaviours if the user wishes.
Introducing a body
would mean that the stream would be cached, and that may not be feasible in some cases.
Yes (well OK, I'm partial cause it was my suggestion :smiling_imp: ), the request body may be way too large to be stored in memory in the general case.
@ruimlopes Your approach, although admittedly handy for simple JSON cases, would also impair the multipart form handler (to be released in Falcon 3.0), and other custom handlers that people may be using to consume arbitrary streams without buffering everything at once (or buffer to disk, cloud or w/e but not to RAM).
Introducing a
body
would mean that the stream would be cached, and that may not be feasible in some cases.Yes (well OK, I'm partial cause it was my suggestion 😈 ), the request body may be way too large to be stored in memory in the general case.
How is that caching behavior ultimately different than what already happens for request.media
?
Indeed, there is a memory overhead by storing request.media
and request.body
together. A possible deviation would be to use request.body
and request.media
in a mutually exclusive way, solving the memory overhead.
Also, I would say it would work for anything that implements BaseHandler
, not only simple json
cases.
For other users that might crash here with the same/similar use case, we will most likely upgrade our workaround to use a custom handler that simply reads the stream into request.media
. We will do json.loads()
when we need it for the first time.
From user perspective, one would expect most common use cases to be naturally available. It seems we have opposite understanding regarding the usage of the request body on that matter - we (the team I work in) are very used to naturally access it in other frameworks, so it's a common use case for us, which sadly is not available without custom code and added maintenance burden.
How is that caching behavior ultimately different than what already happens for request.media?
This mostly depends on the media handler. Like the multipart form handler may not cache all the stream.
But I agree with you, that it would be nice to have the ability to not lose the body once accessed, but I'm personally not sure if body
is the best option here.
we (the team I work in) are very used to naturally access it in other frameworks
Here were you referring the body or the media? or both?
@ruimlopes, just purely practically, if we added support for plugging another reader factory type, would req.stream_reader.data
or req.stream_reader.get_data()
be so much inferior? Particularly if we shipped an optional cached reader class as part of Falcon.
I haven't explored all the ramifications, but my gut feeling is that building this directly into the standard Request
object would get even hairier in the ASGI variant (i.e., the async
version of Falcon).
@CaselIT I've probably missed an important detail, but how is this a breaking change?
I guess only if we enable this by default. but we could avoid it. I'll place this in 3.x then
Ah, I'm not advocating enabling any caching out of the box. But we could ship such an optional component as part of Falcon, alongside documentation how to easily enable that if needed. Probably something not too dissimilar from CORS middleware.
It is sometimes useful to have the ability to use both the raw stream and the media in a request. Examples of this may be to implement a cache on graphql and similar apis that use POST to implement a querying endpoint. Something like
In the example above if the cache misses, there is transparent way of inspecting the data stream in without preventing use of
Request.media
.To aid this use case we could implement the method
get_cached_data(max_size)
(or a similar name) inRequest
.This method would:
ByteIO
bounded_stream
with theByteIO
The implementation would be something like