ClusterHQ / powerstrip

Powerstrip: A tool for prototyping Docker extensions
https://clusterhq.com/
Apache License 2.0
302 stars 32 forks source link

protocol v2: discuss pros/cons of encapsulating adapter request/responses in JSON #44

Open lukemarsden opened 9 years ago

lukemarsden commented 9 years ago

@progrium @binocarlos @robhaswell

We discussed via email the pros and cons of wrapping the adapter protocol bodies etc in JSON versus having unwrapped request/response bodies and using HTTP headers to communicate metadata about the request/response being proxied/modified.

I suggest we use use this issue to openly discuss the pros and cons of these approaches and try to drive to a conclusion for e.g. whether we should work on building an unwrapped v2 protocol.

The main argument in favour of keeping the current wrapped approach is flexibility in terms of encapsulating multiple bodies within one response. For example the post-hook can currently receive both the original client request and the Docker response, so that it can see both what the client asked for as well as what Docker said. (Knowing what the client asked for isn't information that's available via a sub-request to the Docker API, probably.)

The main argument against seems to be that clients end up having to write their own routing logic for handling e.g. request URIs. Since I appear to be biased towards our original design, please add other arguments against on this thread. :)

One idea that may or may not carry any weight is that we might not always want to communicate with an adapter via HTTP. Having a pure JSON protocol means that we're not tied to HTTP-level constructs if, say, we wanted to have an option for communication with an adapter to occur via, e.g. stdin/out of a process exec or in the future something like libchan.

Another idea in favour of keeping what we've got now is that we want to encourage people to build powerstrip adapters. Incompatibly changing the protocol will add a burden to anyone who builds a powerstrip adapter now and then has to change it later. I don't think this argument alone is strong enough to mean that we shouldn't change the protocol (it's v0.0.1 for a reason) but I do think it raises the bar for having good reasons to change it if we're going to.

Finally, encoding this data in JSON means we can nest data structures, which is harder with an HTTP header approach (unless we put JSON in the headers, which feels weird to me). So for example we can have one data type for "requests" and one for "responses", then we can put as many of these objects in a JSON structure. For example, in future we may wish to extend the protocol so that an adapter hook in a chain can see all the previous request/responses that happened to the request before it got called in the chain as a JSON list of request/response objects.

binocarlos commented 9 years ago

Hey,

I think we can separate the routing issue from the format of the body issue. IMHO - the URL of the request should be the URL that docker was hit with. Doing HTTP routing (i.e. is this a /containers/create or a /containers/:id/start) on the actual request will vastly improve the experience of writing adapters.

Presently - I'm writing a bunch of RegExp's where I could be using the vast array of HTTP routers from npm (insert preferred package manager here). Also - the URL the adapter is hit with is currently arbitrary and so feels like an unexploited resource.

The url could be appended to something like '/adapter' providing it contains the docker route in there somewhere - this enables an adapter to have other endpoints like '/status' etc (as pointed out by @progrium)

I think 'Version' and 'Type' belong in the headers (imho) - regardless of the structure of the request body - these things feel like meta data alongside Content-type and similar values.

In terms of the request body - this is a hard one - I'm not liking the bodies as strings thing at the moment - although it seems trivial it has already caused me pain a few times and adds a layer of thinking (do I need to stringify this or parse this etc). Multi-part feels harder - although there is lots of library support for it. However, if multi-part enables us to stream the body through the adapter (see footnote) - I'm all for it.

Perhaps the field names could become just 'request' and 'response'? These are the 2 core properties of the JSON body - 'ClientRequest' and 'ModifiedClientRequest' seem superfluous because sometimes I am writing 'ModifiedClientRequest' even when it's not been modified - again, somewhat trivial but it all adds to the mental load.

Summary:

footnote

It would be 100% great to be able to do this:

$ cat myfile.txt | docker run -ti --rm transformer --uppercase

And for 'somehow' an adapter (either pre or post) to be able to transform the body but AS A STREAM.

For example - I could override the 'uppercase' in the post-hook and make all data on stdin become lowercase and print it to stdout.

Another example - stream the output of a database backup container and send it off to S3 as well as pipe it to stdout on the terminal (like a unix tee adapter).

This opens up a world where powerstrip adapters are working on large data sets that are streamed via the proxy rather than gathered into one massive JSON packet. I cannot see how this would work in the context of having the request body as a JSON packet above and so might need considering.

progrium commented 9 years ago

Yes, streaming support is another thing you get by not serializing into a json string. Also support for arbitrary binary bodies, which would have to be base64 encoded in json, adding yet another layer of decoding.

The more we can just make these just http requests the better. I stand by that 110%. If I were starting from scratch I would not even consider JSON.

On Saturday, January 31, 2015, Kai Davenport notifications@github.com wrote:

Hey,

I think we can separate the routing issue from the format of the body issue. IMHO - the URL of the request should be the URL that docker was hit with. Doing HTTP routing (i.e. is this a /containers/create or a /containers/:id/start) on the actual request will vastly improve the experience of writing adapters.

Presently - I'm writing a bunch of RegExp's where I could be using the vast array of HTTP routers from npm (insert preferred package manager here). Also - the URL the adapter is hit with is currently arbitrary and so feels like an unexploited resource.

The url could be appended to something like '/adapter' providing it contains the docker route in there somewhere - this enables an adapter to have other endpoints like '/status' etc (as pointed out by @progrium https://github.com/progrium)

I think 'Version' and 'Type' belong in the headers (imho) - regardless of the structure of the request body - these things feel like meta data alongside Content-type and similar values.

In terms of the request body - this is a hard one - I'm not liking the bodies as strings thing at the moment - although it seems trivial it has already caused me pain a few times and adds a layer of thinking (do I need to stringify this or parse this etc). Multi-part feels harder - although there is lots of library support for it. However, if multi-part enables us to stream the body through the adapter (see footnote) - I'm all for it.

Perhaps the field names could become just 'request' and 'response'? These are the 2 core properties of the JSON body - 'ClientRequest' and 'ModifiedClientRequest' seem superfluous because sometimes I am writing 'ModifiedClientRequest' even when it's not been modified - again, somewhat trivial but it all adds to the mental load.

Summary:

  • use the docker URL as the URL that hits the adapter (optionally appended to '/adapter')
  • move meta (Version, Type) into headers
  • simplify JSON field names 'ClientRequest' & 'ModifiedClientRequest' become just 'request'

footnote

It would be 100% great to be able to do this:

$ cat myfile.txt | docker run -ti --rm transformer --uppercase

And for 'somehow' an adapter (either pre or post) to be able to transform the body but AS A STREAM.

For example - I could override the 'uppercase' in the post-hook and make all data on stdin become lowercase and print it to stdout.

Another example - stream the output of a database backup container and send it off to S3 as well as pipe it to stdout on the terminal (like a unix tee adapter).

This opens up a world where powerstrip adapters are working on large data sets that are streamed via the proxy rather than gathered into one massive JSON packet. I cannot see how this would work in the context of having the request body as a JSON packet above and so might need considering.

— Reply to this email directly or view it on GitHub https://github.com/ClusterHQ/powerstrip/issues/44#issuecomment-72320271.

Jeff Lindsay http://progrium.com

progrium commented 9 years ago

Also, if the main argument for JSON is encapsulating multiple requests, I'd question that requirement itself, as I did before via email. Every case that's been brought up for a post-hook that uses the request data and response data can be done as a pre-hook that gets its own response from Docker if necessary. In fact, I'm confident the simplest useful model is pre-hooks filter/block, post-hooks just fire-and-forget without chaining.

And btw, when talking about being tied to HTTP constructs... you're not. I've done all this before. It just so happens that what we're doing is exposing HTTP request filtering, so it is convenient that our adapter/hook protocol is the same and we can pass it through with some extra meta-data and semantics. If we were to use libchan or my Duplex streams or anything like that, we'd still be passing HTTP data because that's what we're working with here. Metadata can either be added to that HTTP data or put in some other transport specific mechanism.

Talk about being able to see all previous request/responses in the chain... c'mon. Really? Start simple. Be ok with people not being able to do things. People are clever and will find a way to make it work. Those experiences will feed into an eventual new version that has real use cases and reasons for decisions like this.

binocarlos commented 9 years ago

The following assumes that we are hitting the adapter with the URL that hit docker:

For POST /containers/create - the response from docker is the container ID. A post-hook can use the id to hit GET /containers/(id)/json which gives it access to the request body that was sent to the pre-hook a moment ago.

For POST /containers/(id)/start - we have the container ID as part of the URL and so again we can hit docker with a GET /containers/(id)/json to see how any start parameters have affected the container.

Here is an example of a post-hook using the container ID to ask docker for more information - essentially - "can I have some request data as part of a post-hook".

Here is an example of a pre-hook using the ImageName from the request to ask docker about more information about the image - essentially - "can I have information that was not ever part of a request".

I just did a quick scan of the Docker HTTP api page - here are the requests that are POST with an actual request body (and what the request body is):

All other requests are either GET requests or are POST but with no body. Again - as long as the URL hitting the adapter is the same (or contains) as what hits docker and includes the query string - all information about the request is passed to the post hook as the HTTP method, url and query string.

So - if we focus on the important 2 for a minute (/containers/create and /containers/(id)/start) - it does seem that we don't need the request body sent to the post-hook. In other words - the post-hook could get the equivalent of the request body without having to inject it into the body it receives.

Hope this makes sense - from an adapter authors perspective, I'm finding that this is a very useful discussion :-)

progrium commented 9 years ago

Thanks for that quick data/analysis!

On Sat, Jan 31, 2015 at 9:49 AM, Kai Davenport notifications@github.com wrote:

The following assumes that we are hitting the adapter with the URL that hit docker:

For POST /containers/create - the response from docker is the container ID. A post-hook can use the id to hit GET /containers/(id)/json which gives it access to the request body that was sent to the pre-hook a moment ago.

For POST /containers/(id)/start - we have the container ID as part of the URL and so again we can hit docker with a GET /containers/(id)/json to see how any start parameters have affected the container.

Here is an example https://github.com/binocarlos/powerstrip-weave/blob/master/actions/start.js#L30 of a post-hook using the container ID to ask docker for more information - essentially - "can I have some request data as part of a post-hook".

Here is an example https://github.com/binocarlos/powerstrip-weave/blob/master/actions/create.js#L36 of a pre-hook using the ImageName from the request to ask docker about more information about the image - essentially - "can I have information that was not ever part of a request".

I just did a quick scan of the Docker HTTP api page - here are the requests that are POST with an actual request body (and what the request body is):

  • POST /containers/create - container options json
  • POST /containers/(id)/start - start options json
  • POST /build - TAR STREAM
  • POST /auth - credentials json
  • POST /commit - container options json
  • POST /images/load - TAR STREAM
  • POST /containers/(id)/exec - exec options json
  • POST /exec/(id)/start - exec start options json

All other requests are either GET requests or are POST but with no body. Again - as long as the URL hitting the adapter is the same (or contains) as what hits docker and includes the query string - all information about the request is passed to the post hook as the HTTP method, url and query string.

So - if we focus on the important 2 for a minute (/containers/create and /containers/(id)/start) - it does seem that we don't need the request body sent to the post-hook. In other words - the post-hook could get the equivalent of the request body without having to inject it into the body it receives.

Hope this makes sense - from an adapter authors perspective, I'm finding that this is a very useful discussion :-)

— Reply to this email directly or view it on GitHub https://github.com/ClusterHQ/powerstrip/issues/44#issuecomment-72322885.

Jeff Lindsay http://progrium.com

robhaswell commented 9 years ago

I think there are three discussions happening at once, which are actually independent. I'm splitting out the request and version discussion into separate tickets, please discuss them there:

https://github.com/ClusterHQ/powerstrip/issues/45 version in headers https://github.com/ClusterHQ/powerstrip/issues/46 request in HTTP