clue / reactphp-docker

Async, event-driven access to the Docker Engine API, built on top of ReactPHP.
https://clue.engineering/2019/introducing-reactphp-docker
MIT License
108 stars 16 forks source link

Parse streaming responses #4

Closed clue closed 9 years ago

clue commented 9 years ago

Some actions (like pulling images, viewing logs, attaching to containers etc.) produce streaming output. We should parse these HTTP streaming responses into individual virtual streams and forward any events on the virtual stream.

Also notice the difference between HTTP/1.0 and HTTP/1.1 requests.

HTTP/1.0 streaming body:

echo -en "POST /images/create?fromImage=clue/redis-benchmark HTTP/1.0\r\n\r\n" | nc -U /var/run/docker.sock
HTTP/1.0 200 OK
Content-Type: application/json
Date: Thu, 27 Nov 2014 09:42:15 GMT

{"status":"Pulling repository clue/redis-benchmark"}
{"status":"Pulling image (latest) from clue/redis-benchmark","progressDetail":{},"id":"478a1142ea76"}{"status":"Pulling image (latest) from clue/redis-benchmark, endpoint: https://registry-1.docker.io/v1/","progressDetail":{},"id":"478a1142ea76"}{"status":"Pulling dependent layers","progressDetail":{},"id":"478a1142ea76"}{"status":"Download complete","progressDetail":{},"id":"511136ea3c5a"}{"status":"Download complete","progressDetail":{},"id":"f10807909bc5"}{"status":"Download complete","progressDetail":{},"id":"f6fab3b798be"}{"status":"Download complete","progressDetail":{},"id":"1e6ac0ffed3b"}{"status":"Download complete","progressDetail":{},"id":"62ff5003ac9a"}{"status":"Download complete","progressDetail":{},"id":"c6e4fc6c4a10"}{"status":"Download complete","progressDetail":{},"id":"984fd90de307"}{"status":"Download complete","progressDetail":{},"id":"4784dfba86d0"}{"status":"Download complete","progressDetail":{},"id":"4d4120826ad3"}{"status":"Download complete","progressDetail":{},"id":"b2428b25e452"}{"status":"Download complete","progressDetail":{},"id":"28995deeed39"}{"status":"Download complete","progressDetail":{},"id":"9cd1838bd19c"}{"status":"Download complete","progressDetail":{},"id":"c78781ea905b"}{"status":"Download complete","progressDetail":{},"id":"d7173b5f3fc7"}{"status":"Download complete","progressDetail":{},"id":"6a8a6a35a96b"}{"status":"Download complete","progressDetail":{},"id":"28fdd31ac753"}{"status":"Download complete","progressDetail":{},"id":"3ce54e911389"}{"status":"Download complete","progressDetail":{},"id":"b7261e19024d"}{"status":"Download complete","progressDetail":{},"id":"bb182a1a4f8e"}{"status":"Download complete","progressDetail":{},"id":"cabbf32a5995"}{"status":"Download complete","progressDetail":{},"id":"478a1142ea76"}{"status":"Download complete","progressDetail":{},"id":"478a1142ea76"}{"status":"Status: Image is up to date for clue/redis-benchmark"}
…

HTTP/1.1 streaming body uses chunked encoding:

$ echo -en "POST /images/create?fromImage=clue/redis-benchmark HTTP/1.1\r\n\r\n" | nc -U /var/run/docker.sock    
HTTP/1.1 200 OK
Content-Type: application/json
Date: Thu, 27 Nov 2014 09:43:27 GMT
Transfer-Encoding: chunked

36
{"status":"Pulling repository clue/redis-benchmark"}

65
{"status":"Pulling image (latest) from clue/redis-benchmark","progressDetail":{},"id":"478a1142ea76"}
91
{"status":"Pulling image (latest) from clue/redis-benchmark, endpoint: https://registry-1.docker.io/v1/","progressDetail":{},"id":"478a1142ea
76"}
4d
{"status":"Pulling dependent layers","progressDetail":{},"id":"478a1142ea76"}
46
{"status":"Download complete","progressDetail":{},"id":"511136ea3c5a"}
46
{"status":"Download complete","progressDetail":{},"id":"f10807909bc5"}
46
{"status":"Download complete","progressDetail":{},"id":"f6fab3b798be"}
46
{"status":"Download complete","progressDetail":{},"id":"1e6ac0ffed3b"}
46
{"status":"Download complete","progressDetail":{},"id":"62ff5003ac9a"}
46
…
clue commented 9 years ago

Depends on https://github.com/clue/php-buzz-react/issues/26

clue commented 9 years ago

Looking into this in the upcoming days.

Current status: Functional prototype ready :) Need to flesh out a decent API and push changes to upstream components.

Initial thought was that the chunked encoding would be easier to consume, as it includes proper separators between each JSON document. Turns out that Docker's API uses chunked encoding only for HTTP/1.1 requests. However, the react/http-client library is limited to issuing HTTP/1.0 requests only (see reactphp/http-client#5). There does not appear to be way to enable chunked encoding otherwise (TE header etc.).

As such, I've looked into parsing the HTTP/1.0 streaming body that includes several JSON documents with no clear separator inbetween. This turned out to be quite easy nonetheless: I've created a simple streaming JSON parser (https://github.com/clue/json-stream) that will be added as a dependency.

adamlc commented 9 years ago

Any news on this @clue ? :)

clue commented 9 years ago

Thanks for poking me @adamlc :)

The functional prototype is ready, but I've never been a fan of my demo API and am reluctant to push this as-is. Perhaps we can sort out a decent API before looking into the actual implementation?

What could a possible API look like?

clue commented 9 years ago

My initial thought was something along of the lines of this:

// possible/experimental API for the above example
$status = $client->imageCreate('clue/redis-benchmark');
$status->on('data', function ($data) {
    // data will be emitted for *each* complete element in the JSON stream
    echo $data['status'] . PHP_EOL;
});
$status->on('close', function () {
    // the JSON stream just ended, this could(?) be a good thing
    echo 'Ended' . PHP_EOL;
});
adamlc commented 9 years ago

@clue that looks sensible to me :+1:

clue commented 9 years ago

Thanks for the feedback, much appreciated! :+1:

One more thing:

All other methods use a Promise-based API like this:

$promise = $client->imageDelete('clue/redis-benchmark');
$promise->then(
    function ($result) { echo 'did work'; },
    function (Exception $e) { echo 'did not work'; }
);

How should this be handled in this particular case?

IMO ideally, the stream API should expose both stream operations AND the promise results? But then again, which value should be used as a result in the above imageCreate() example?

$status = $client->imageCreate('clue/redis-benchmark');

// what value does this resolve with?
$status->then(function ($result) { var_dump($result); });
adamlc commented 9 years ago

I guess it makes sense to implement both!

I suppose you'd expected the completed promise to return everything from the stream results. I guess they'd have to be split up some how. Maybe have some sort of array or iterator that allows you to fetch each result separately?

How would this work for things like following the logs? Obviously these don't have an obvious end result, they kind of continue until stopped, which I guess may confuse things if they also return a promise?

clue commented 9 years ago

I guess it makes sense to implement both!

Yeah, I'm starting to think it makes sense to expose both APIs – via different methods possibly.

Afaict Docker contains the following API endpoints that (can) exhibit a streaming behavior:

imageCreate();
imagePush();

containerLogs(); ($follow flag)
containerStats();

containerAttach();  ($stream flag)
execStart(); ($Detach flag)

containerExport();
containerCopy();

IMO it makes sense to expose most API endpoints via the Promise-based API because it's more convenient and easier to get started.

For example, considering the containerCopy() method, it's probably easiest to use the Promise-API like this:

$client->containerCopy('container-name', array('Resource' => 'filename'))->then(
    function ($contents) { }
);

However, this also means that the whole file contents has to be buffered in RAM. This would be okay for smaller files, but bigger files could probably benefit from a streaming API:

$stream = $client->containerCopyStream('container-name', array('Resource' => 'filename'));
$stream->on('data', function ($data) {
    // received a chunk of data
});
$stream->on('close', function () {
    // stream ended (EOF)
});

I guess a similar approach would also work for most other API endpoints. The initial example could probably look like this:

// simple Promise-based API
$promise = $client->imageCreate('clue/redis-benchmark');
$promise->then(function ($data) {
    // $data is an array of *all* elements in the JSON stream
});

// possible/experimental streaming API
$status = $client->imageCreateStream('clue/redis-benchmark');
$status->on('data', function ($data) {
    // data will be emitted for *each* complete element in the JSON stream
    echo $data['status'] . PHP_EOL;
});
$status->on('close', function () {
    // the JSON stream just ended, this could(?) be a good thing
    echo 'Ended' . PHP_EOL;
});
clue commented 9 years ago

Also, the Promise-based API could probably benefit from the shunned Promise progress events:

$promise = $client->imageCreate('clue/redis-benchmark');
$promise->then(
    function ($data) {
        // $data is an array of *all* elements in the JSON stream
    },
    function (Exception $error) {
        // an error occurred (possibly after receiving *some* elements)
    },
    function ($element) {
        // will be invoked for *each* complete $element in the JSON stream
    }
);

Each Promise has to resolve with the buffered, combined result of each individual element anyway, so it should be easy to also emit every element individually.

adamlc commented 9 years ago

Looks good to me! I haven't heard of shunned promises before, but it certainly makes sense :dancers:

You mentioned above about bigger files using a stream. Do you think this should be forced regardless? The end user can then decide if they need to buffer it locally or not?

clue commented 9 years ago

The end user can then decide if they need to buffer it locally or not?

Yeah, the recommended (safe) way would probably be to use the streaming API, as it can handle arbitrarily sized files (only smaller chunks are kept in memory), while the simpler Promise-based API has to store the whole contents in memory.

If, for example, you want to "exec" a cat /etc/passwd call, it's probably easier to use the Promise-based API. However, if you're unsure how big your stream could potentially be, it's probably safer to go with the streaming API instead.

adamlc commented 9 years ago

Awesome! Thanks for clearing that up, looking forward to the updates!

I'm actually planning on build a react based monitor / app for our cluster (which is all dockerized) to auto scale and stuff, well thats the plan!

clue commented 9 years ago

Awesome! Thanks for clearing that up, looking forward to the updates!

Thanks for discussing some of these core concepts!

I've justed filed #9 as a WIP PR that shows how this API is going to look like. Please feel free to review/comment! :+1:

I'm actually planning on build a react based monitor / app for our cluster (which is all dockerized) to auto scale and stuff, well thats the plan!

Awesome, keep me posted! :)

clue commented 9 years ago

After discussing this matter with @jsor, I've pushed the discussion about the dreaded promise progress API upstream (https://github.com/reactphp/promise/issues/32) and would vote to remove this from #9 for now.

Given that this ticket already suggested implementing two alternative APIs (Promise-based and Stream-based) anyway, there's little point in also exposing individual progress events via the promise progress API. As such, I'm going to remove this from #9 for now.

clue commented 9 years ago

Thanks for the discussion!

clue commented 9 years ago

Closed via #9.