Closed yaronyg closed 9 years ago
Oh and I realize I was making an assumption that's almost certainly wrong. I was trying to imagine how to add content and transfer coding support so existing code would just continue working. This is where the buffering came in. But it would seem kind of obvious that a less intrusive solution would be to just provide a filter that can sit on top of the input (and output) streams and support transfer coding and content coding. Only those using the filter need worry about it. So I suppose all the features can be added without much ceremony.
Thoughts?
I just merged a pull request that adds support for non-enum based status responses. Basically it adds an interface that your code can substitute.
The main server loop accepts incoming connections and blindly grabs their output stream:
outputStream = finalAccept.getOutputStream();
It looks like a fairly simple (and non-invasive) refactoring to extract that into a protected method, so a derived class can return a ZIP / GZIP output stream rather than the bare one that comes straight from the socket connection. With that simple change to the NanoHttpd core you should be able to then create a sample that demonstrates GZIP output.
I am not familiar with chunked request encoding. Can you point me at the relevant HTTP protocol documentation?
Thanks!
http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-26#section-4.1 has the gory details.
I should say that the project is now called Thali and is available at https://thali.codeplex.com and that we are now using CouchBaseLite which uses TJWS.
TJWS has issues such as this, this and this. But more generally the code is really hard to follow and it uses its own custom build environment so we can't even build the thing sanely. What we really need is a dirt simple servlet container that can handle multi-threaded environments well. We are running exclusively on devices (e.g. phones, laptops, etc.) so we don't need super fancy features. Just a plain but very compliant HTTP server that can scale well under load and is crazy secure. Remember, these are people's personal phones that are accepting incoming connections! Security is job 1.
And of course HTTP/2.0 support would be amazing since it's perfect for device perf/bandwidth needs.
Oh and a pony. Definitely want a pony. :)
NanoHttpd is definitely not a servlet container. No attempt has been made to make it conform to the J2EE / JEE spec, so I cant really fill that need specifically. That said, I know of applications in production that use NanoHttpd and override "serve()", then match the incoming URI against a set of known patterns and delegate the request - a poor man's version of the "servlet mapping" you find in web.xml elsewhere.
I believe that NanoHttpd is compliant to the HTTP 1.1 spec, and production applications have shown it to handle file uploads & other requests gracefully. If you run into issues though, please let me know!
Oh, and I found you a pony: https://farm3.staticflickr.com/2904/14014227254_cf0a47c02e_b.jpg
First, thank you for the pony! It's awesome! Seriously, rainbows and everything! It almost feels like it's my little pony! :)
Second, I'm sorry for even mentioning servlets. That is a red herring. While it is true that Couchbase Lite (the project I depend on which then depends on TJWS) does use a servlet container I don't see them using any features from servlets. They just grab '/' and handle everything themselves. So there is really no reason why we would have to replace TJWS with another servlet container.
Third, we do need a pretty hardy container in terms of HTTP protocol support. This includes:
Handling expect-continue properly - so we can 'peak' at the headers of a request and return a failure quickly before processing the body.
Handling chunked transfers in BOTH directions - Providing a streaming interface that translates to/from a chunked transfer encoding on the wire.
Robust thread management - In some cases our devices are going to get pummeled with requests. So we need each request to be able to get its own thread and for threads to be properly re-used by the underlying HTTP engine.
Proper connection management - Unfortunately not all clients are well behaved and they can do really annoying things like leaving connections half opened. Also when one of our server requests fails in a bad way we don't want it to leave a connection open either. So we need the underlying engine to be smart about cleaning up broken connections.
GZIP Transfer Encoding Support - Bandwidth is always at a premium so GZIP transfer encoding support is important.
SSL Support - My own project, Thali, uses SSL mutual auth with self signed certs for privacy and authentication. This means that not only do we need SSL support but we also need a way to securely associate each request with the connection it came over and to then be able to query that connection so we can determine (on a message by message basis) what identity was validated from the client via SSL mutual auth.
Android & Java Support - We run in both environments using mostly the same code base (overlap is literally north of 90%). So we need a HTTP engine that is comfortable in both.
A path to HTTP/2.0 - Eventually we'll want HTTP/2.0 support. Not today. But it would be good to know that the community is on a path to that end.
Apache 2.0 Friendly License - We are open source through and through via Apache 2.0 so we need code that is either Apache 2.0 or Apache 2.0 friendly (e.g. MIT, BSD, etc.)
Does this sound like a job for NanoHTTP? Someone else?
Thanks!
Yaron
Easy answers first:
Android / Java - Android was the driving force to get me involved with this project in the first place. I am fiercely protective of the server being able to run everywhere, but especially well on Android.
License - I believe in free software, and that means a license with no "strings" attached. NanoHttpd is BSD licensed - https://github.com/NanoHttpd/nanohttpd/blob/master/LICENSE.md
GZIP - not supported out of the box but easily added and happy to merge a pull request - the outputStream
(on line 832) is final right now. Make that a non-final member and add a setOutputStream()
and getOutputStream()
to the HTTPSession class, and you would be able to insert a GZIPOutputStream at the start of your own serve()
method. Or, better still, examine the "Accept-Encoding" header to determine if the browser supports GZIP in the setter.
Threading - NanoHttpd only spawns one thread directly (the main listener loop) - each new incoming request is handled on its own thread via a call to the exec()
function on the instance of the "AsyncRunner" interface. The default implementation spawns a new thread each time that exec()
is called, passing it a Runnable. Feel free to call setAsyncRunner()
with your own threading strategy before you start() your own server. Extended "soak testing" shows that threads don't hang around long, but if you are concerned, a more limited thread-pooling strategy might be your answer.
Connections - To the best of my knowledge the server cleans up all resources, both connections and temp files - extended "soak testing" of the server hasn't demonstrated any resource leaks.
Now the harder ones.
expect-continue and chunked requests - not implemented yet.
SSL - code was submitted and merged onto a branch but it will stay there until there is time to ensure that it runs flawlessly on Android as well as general Java - I am fiercely protective of the the mobile experience! Having said that, I believe the code to be good and worth you taking a look at.
HTTP/2.0 - The specification isn't final, and wont be for months to come. I was hearing dates like "November 2014" suggested as final submission of the "HTTP/2.0" standard. In my mind I expect that the "2.x" version of NanoHttpd to continue at least until early 2015, with HTTP/2.0 implementation being the killer-feature driving the step up to "NanoHttpd 3.0". I have no problem with merging pull requests if people want to contribute to that work earlier though, just expect the "HTTP/2.0" branch to be treated as experimental until at least that time.
expect-continue is more important than chunked requests. At least for the scenarios I've seen. And HTTP/2.0 is just something that should be out there on the horizon, it's not something I personally worry about today.
For me a bigger concern is pipelining.
Pipelining - It would be good if the server supported pipelining requests. See here for details.
Pipelining is tricky if one is using asynchronous thread handlers because one can end up with ordering issues.
For example, Get request 1 is handed to asynchronous handler 1 and then before handler 1 is done (running on its own thread) a second request arrives and is dispatched to handler 2. It's perfectly possible that handler 2 might have its response ready before handler 1. If both are running asynchronous then handler 2 might try to put its bytes on the wire before handler 1. That's strictly illegal. The responses MUST be returned in the order they came in. And, of course, responses can't be intertwined with each other.
So there has to be a manager some place who is spawning off handlers who is reasonably intelligent. It has to know when it's legal to pipeline (with 'SAFE' methods like GET) and when it's illegal (with 'UNSAFE" methods like PUT or POST). It has to return errors for illegal and allow the legal. It also has to manage the legal responses so that responses are returned in the same order as requests even if the handlers involved are not ready in order. It's not brain surgery but it does require explicit handling.
The reason why I care about pipelining is that we are talking about a system based on couchDB, aka, synch. A very common scenario is to do something like synch all the photos in a folder. That shows up as a JSON document containing the list of folders along with links to attachments with the actual photos. So a properly written client can just fire off a stack of GET requests using pipelining to pull down those photos. By pipelining the GET requests for the photos the latency is massively reduced since you stack the requests. Basically you can just fill the pipe.
BTW, sorry to be a pain, but one more question. How do you think about nanohttpd versus something like http://www.devlper.com/2010/12/a-bare-minimum-web-server-for-android-platform/? The Apache code is fully supported on Android and of course it also runs on generic Java. The API seems simple enough, more or less at the same level of nanonhttpd (e.g. here are the headers and body, please return headers and a body). I tried to do a search to find a comparison but nothing turned up. Sorry if my Internet foo is weak. Any comments or a pointer to where you have answered this before?
NanoHttpd has maintained a design goal of being a single file that you can include in your project that relies on nothing but the base JDK. It also doesn't require any config files at runtime. The goal is to be small and light & to provide a solid (tested) implementation of HTTP for easy embedding.
I don't think anyone has written an in-depth, direct head-to-head comparison of the 2 projects.
If the Apache codebase provides more of what you need, and the additional JAR files are not a burden to you, then by all means use the Apache library. I have nothing but respect for what the Apache community has created over the years!
GZIP compression of the response (content) can be implemented without any changes to NanoHTTPD.
As a matter of fact, only the response data must be compressed, and the Content-Length header must signal the length of the compressed content, which must therefore be known in advance (assuming GZIP is used for content encoding and not for transfer encoding).
This is how I have done it (after checking that the client accepts "gzip" as an encoding): the "serve" method constructs a Response with a ByteArrayInputStream that wraps a byte array which has been created from a GZIPOutputStream wrapping a ByteArrayOutputStream wrapping the original byte array (or string), and a "Content-Encoding" header is added before returning the response. Here is how to avoid a pitfall: http://stackoverflow.com/questions/14777800/gzip-compression-to-a-byte-array.
In cases where content is being dynamically zipped wouldn't you rather use chunked encoding so you don't have to queue the whole thing up in memory first in order to determine it's size?
Was there ever an example of implementing GZIP?
I thinkt this issue is just to big, I created an issue #177 for the compression. @yaronyg please do seperate request for every feature. It is very difficult to synchronise issues that include a whole bunch of featue/requests
I am building a HTTP server to run inside of a web browser as part of the https://peerly.codeplex.com/ peer to peer web project. I'm using NanoHTTPD because it runs both on the desktop and in Android. But there are a couple of features that are missing that I wanted to know if nanohttpd is interested in.
Chunked Transfer from Clients - The HTTP/1.1 spec allows both clients and servers to send chunked transfers whenever they want and HTTP/1.1 servers are required to support it. I see that chunked transfer for responses was just added but what about chunked transfer for requests? My use case involves pushing large pieces of parsed data like json between phones so we really don't want to have to manifest them in memory in order to get an accurate data size.
Content Coding - The same vein we would really want support for Content Coding like GZIP. Peer to peer environments are often bandwidth constrained and we move a lot of easily compressible content like JSON around.
Enum free processing for methods and response codes - Right now methods and response codes are identified by enums. But a big part of our project is to provide a generic HTTP environment that can support experimental work with new methods and response codes. So we would like versions of the interface that just provides the method as a string and accepts the response code as a number plus a description.
The last one is pretty easy and unless there are objections at some point I'll just submit a push request for it that should be compatible with the existing interfaces. But the first two are probably not implementable in a way that is compatible with the existing interfaces since they break what's in the input stream in a pretty fundamental way. There are ways to paper over this but only at the cost of buffering everything in memory (or on disk or somewhere) and that sounds like a really bad idea.
That's why I'm bringing this issue up here because I want to understand what y'all are thinking about these ideas and if you have any interest in seeing them tackled given that they probably require interface changes.