davidmoreno / onion

C library to create simple HTTP servers and Web Applications.
http://www.coralbits.com/libonion/
Other
2.01k stars 250 forks source link

wrong semantics for PUT? #189

Open usergoodvery opened 8 years ago

usergoodvery commented 8 years ago

Hi, It seems the the library automatically saves the data of PUT into a temporary file, even if the request is not multi part form. Is this by design?

Specifically, in my case I was sending a PUT with json body, but could not get the body data with onion_block_data(request_data); as that returned the the path to the tmp file, which I presume contained the json content. Changing the request to POST works. On face value, unless I am missing something, that is not inline with REST style api's

regrds, ayman

zacharygrafton commented 8 years ago

This is pretty much a duplicate of #104, but I think it needs addressed. We just need to decide what direction we would want to take.

usergoodvery commented 8 years ago

Thanks for pointing out the thread. It appears a design limitation issue. As a general principle, the api should trust the user, even if it means the server would crash due to large amount of data pumped into memory. Up to the user to decide how to mitigate that situation. Having said that, as a safety fallback mechanism I think the proposed onion_set_max_put_mem_size() function could be implemented, whereby by default there is no limit, until set by the user, using that function. If size exceeded, stream to file. Albeit, personally, I find this a clunky design.

I am not sure how WedDav does stuff, but the current PUT semantics would break lot of client implementations.

But this points out to another potential design enhancement, the api should be bit more event-driven in terms of what decision it makes for itself, where user input might potentially helped optimise what do to with the request. For example, registering a callback: on_header_parsed() the user can inspect the parameters and decided what to do if the size is larger than what he/she expects deny the request.

Another connected issue with this, at least according to how I have used the library, uploading files circumvents the basic authentication mechanism, because the handler is called after the file is fully uploaded. By the time I authenticated the user it is too late to reject request, as the file has already consumed network bandwidth., so a malicious user can merrily keep the server busy even if they don't have the authorisation to do that. An event hook with callback can prevent that in n elegant way.

davidmoreno commented 8 years ago

As a fast fix I think we can add a flag to force PUT into mem instead of file. If we do this the default should be in mem, and if the flag is set, then to file. It is just an if at request_parser.c:755 (+ flag at header, + tests)

Some notes about the proposed solutions:

The first is complex although it may be transparent and simple to use. I will not like the complex code from there, and some people may not like that after some PUT length there is a huge performance change. But I think it will work. Anyway it should be with preset limits (10MB/1GB, for example).

The second idea has a lot of appeal, but at the same time it opens a can of worms: only that callback or we add more for other stages? What about pre handler and post handler? proper middleware? On the callbacks it may need some routing as well, as not every route may need the same semantics. Also it may require changes to the parser.

So in order of preference of how to tackle this issues, mine is: 1. Flag, 2. transparent file/block, 3. event callbacks.

I think 2 and 3 are perfectly compatible with each other. Number 1 should go away as soon as 2 is implemented.

What do you guys think? Do I start with nr 1?

David.

usergoodvery commented 8 years ago

I think the change should be long-term minded and strategic. Implementing the flag sounds like a good cheap workaround and doesn't change the semantics of the api alot. If you are open to events,I'd skip on option 2, because that would be unnecessary refactoring and clearly changes the semantics of the api.

Whilst I have not thought about this too deep, I'd say as a general principle the events lifecycle should align with the underlying protocol and can be made granular for example, onPost, onGet, onPut, onFileReceived... that will align nicely with the handler callback, which is the ultimate sink point for the request. You can still provide default implementations for those callbacks which would provide a baseline for the average user, consistent with the way the current api behaves and kind of give it out-of-the-box feel.

zacharygrafton commented 8 years ago

I think maintaining consistency with the behavior of the API is important, by default, every method but the PUT method streams all the data into memory, and that has definitely created some confusion. I like option 3, but if we go that route, we should definitely go all the way, proper middleware could be a huge advantage for the project, however, I think we should avoid the onPost, onGet, onPut, style callbacks, that could be extremely limiting for implementing other higher level protocols on top of onion, for example, WebDAV, CalDAV, and other more complex REST style APIs. I've actually been looking at adding some additional recognized methods to the library.

usergoodvery commented 8 years ago

Finding the right level of abstractions is a balancing act. However, lets not forget, at its core, we are processing a HTTP protocol, so onPost etc,... are sufficiently lowlevel building blocks that should not constraint higher levels layers, such as WebDav and friends...

usergoodvery commented 8 years ago

BTW another way to tightly control the semantics of PUT is to look for "Content-Type", "application/octet-stream". By convention, most developers expect that to go to a file, as opposed to memory.

davidmoreno commented 8 years ago

I just did the first commit, please check and add comments here.

I forgot to add to the commit message that I think that the final solution for this specific bug must be a mix of nr 1 and 3: A flag to set the behaviour of PUT server wide, and the hooks to change it on specific requests if necessary. So if a server will always need PUT in mem or disk can have it easily, and if it is by request, its possible as well.