celluloid / reel

UNMAINTAINED: See celluloid/celluloid#779 - Celluloid::IO-powered web server
https://celluloid.io
MIT License
596 stars 87 forks source link

Multipart Parser implementation. #149

Open digitalextremist opened 10 years ago

digitalextremist commented 10 years ago

I was about to start a PR with my code for this, which is here:

https://github.com/penultimatix/reel/pull/28

But apparently my repository needs to be cleaned up. Catching a lot of out of date stuff when I start an official Reel branch PR. I made the multipart-parser branch though, so I'll likely just directly push the code into that.

I figured @asmod4n might have something to say here, since he's worked with multipart-parser too.

How I've done this so far, is to add Reel::Request::Body::Multipart and associated hooks into Reel::Request and Reel::Request::Body ... I feel that I found the right place to put the parser, and I've implemented it with the least invasion possible.

I considered attaching it directly to the HTTP parser, but that seemed unnecessary, since multipart requests are the vast minority, and ought to be checked for. Meaning, if I anticipate a multipart request, one ought to do multipart? on a Reel::Request just as we do websocket? now.

In an event, the code will be cleaned up and posted to a Reel branch soon. For now, it's on one of my repositories. Lastly, since the multipart-parser is not being actively maintained by the original author, I decided to tie gemspec to my own repository and not the original. I will actively maintain the gem, and keep it current with Reel's needs.

digitalextremist commented 9 years ago

I think this is an essential to Reel, since this comes full circle on what brought me into close contact with this group in the first place... discovering multipart parsing is fundamentally broken/non-existent.

I have working code in my own version of Reel, and so long as it doesn't slow down 0.6.0 I would like it to show up in -pre and bring a big feature along with bug fixes for this coming release.

/cc: @tarcieri

tarcieri commented 9 years ago

You might want to talk to @ixti about it. He's working on some gems under the https://github.com/httprb/ organization

kyledrake commented 8 years ago

FWIW, it would be great to have support for this!

tarcieri commented 8 years ago

@ixti is the multipart parser something you're still interested in working on?

digitalextremist commented 8 years ago

If not, I have code for this. @ixti and I briefly spoke about bringing my code to http.rb in some form, then bring that over to Reel. I'll get on this after Reel::IO since I have an instance of Reel using the code already.

The only real question is how to access the parts parsed. Like WebSocket access being nonstandard in interfacing practice which lead to the hijack behavior of Rack when there has to be a better way, multipart isn't so straight forward. Do we test for multipart bodies every pageload? Do we add a step like hijack to get at the data? My idea is to lazy load it. Combine both benefits without either downside. Behave like it's always parsed, but only parse at the point of access.

kyledrake commented 8 years ago

IIRC, the browser/client has the responsibly to declare the type as multipart/form-data in the headers, and if it does not then you shouldn't need to parse it.

I wanted to highlight https://github.com/danabr/multipart-parser, which claims to have an event-driven api and may be an excellent base for this implementation (though I have never used it and cannot confirm). My hacky solution to this issue was going to be to use some flavor of that (detect header, toss body in tempfile in chunks, parse using this gem). I have not attempted this yet.

tarcieri commented 8 years ago

My hacky solution to this issue was going to be to use some flavor of that (detect header, toss body in tempfile in chunks

Would be nice to not require a Tempfile but stream the data instead. A Tempfile is nice for, uhh, Rails or something, but Reel's goal has been to have streaming APIs everywhere

kyledrake commented 8 years ago

Yeah, that would be ideal here. It would require a streamed parsing, which the parser gem may or may not support.

It gets weird when you need to start accounting for form "variables", though. I'm not sure how easy it is going to be to stream something like that.

digitalextremist commented 8 years ago

My code forked and revamped the gem mentioned, and uses streams vs. temp files.

digitalextremist commented 8 years ago

By the way @kyledrake -- yes, the incoming request declares whether it is multipart or not, but checking for that ( rather than stumbling upon it doing something always needed ) is what I'm referring to. It may seem insignificant, but if one out of every thousand messages is multipart, it seems wasteful to even test for unless demanded/expected. And variable parsing is handled in an evented way last I remember too. The gem was pretty unactive and not every approach taken was perfect as-is... but you are right that the gem is a very good base.

ixti commented 8 years ago

@tarcieri

@ixti is the multipart parser something you're still interested in working on?

Yeah. Having lots of changes in my life atm, so have not much time unfortunately. Hopefully should finish with all rush and chaos I have soon to dive back into active development ;D