WordPress / blueprints-library

29 stars 4 forks source link

HTTP client doesn't handle responses with Transfer-Encoding: chunked #104

Open MayPaw opened 3 months ago

MayPaw commented 3 months ago

In the process of working on #82 (E2E tests) with @reimic we found an issue with the way how the HTTP client handles chunked streams.

Currently, the response is written with the chunk markers, as if they were normal characters. This corrupts the wp-cli.phar. Handling Transfer-Encoding: chunked is expected behaviour of HTTP/1.1 protocol.

At the moment we have two ideas how to handle this issue:

  1. Change the used HTTP protocol version to HTTP/1.0, which by default doesn't support chunks. Currently the function stream_http_prepare_request_bytes in async_http_streams.php prepares a request in 1.1:

    $request = <<<REQUEST
    GET $path HTTP/1.1 // We could change this to: GET $path HTTP/1.0
    [...]
    REQUEST;

    We've tested this approach and it works. The downside to this would be probable performance loss, since HTTP/1.1 is supposedly more efficient. Yet neither I nor @reimic are capable of assessing the influence of such change.

  2. Alternatively, the streams could be filtered with the PHP dechunk filter when performing read or write operations. We have also tested that this can work. Yet this is somewhat problematic: We tried appending the filter after the stream creation, in function streams_http_open_nonblocking in async_http_streams.php and several other places. Unfortunately, due to usage of @stream_select in continuous reading and writing the response, appending stream filter while performing those operations is impossible (reference: https://github.com/reactphp/event-loop/issues/220). The first place, where it worked as expected, was at the very last possible moment, i.e. while copying the already closed stream in run method of WriteFileStepRunner.php.

    public function run(
        $input,
        $progress = null
    ) { 
         [...]
        $fp2 = fopen( $path, 'w' );
         [...]
         stream_filter_append($fp2,"dechunk",STREAM_FILTER_WRITE); // dechunking the stream
        stream_copy_to_stream( $this->getResource( $input->data ), $fp2 );
        fclose( $fp2 );
    }

    This doesn't seem like the right place to do this though, as dechunking should probably be done when handling the response in the client.

@adamziel what do you think?

adamziel commented 3 months ago

Great issue! Let's use HTTP 1 until we have chunked encoding. Chunked encoding would have to be likely implemented here

function streams_http_response_await_bytes( $streams, $length, $timeout_microseconds = 5000000 ) {
    $read = $streams;
    if ( count( $read ) === 0 ) {
        return false;
    }
    $write  = array();
    $except = null;
    // phpcs:disable WordPress.PHP.NoSilencedErrors.Discouraged
    $ready = @stream_select( $read, $write, $except, 0, $timeout_microseconds );
    if ( $ready === false ) {
        throw new Exception( 'Could not retrieve response bytes: ' . error_get_last()['message'] );
    } elseif ( $ready <= 0 ) {
        throw new Exception( 'stream_select timed out' );
    }

    $chunks = array();
    foreach ( $read as $k => $stream ) {
        if ( PHP_VERSION_ID <= 71999 ) {
            // In PHP <= 7.1, stream_select doesn't preserve the keys of the array
            $k = array_search( $stream, $streams, true );
        }
        $chunks[ $k ] = fread( $stream, $length );
    }

    return $chunks;
}

Note we might also need to provide relevant response headers to that function (or maybe they're already available via stream context?)

adamziel commented 3 months ago

HTTP 1.0 is fine, but if you're up for a challenge and want to implement chunked encoding, here's how:

In that function above, you'd read the chunk size (is it two bytes?), associate it with the stream somehow (stream context?), and then buffer the next chunk size bytes to $chunks. You'd have to keep track of the last chunk size and bytes read for each stream because streams_http_response_await_bytes might stop in the middle of a chunk and resume later. Then you'd rinse and repeat until the end of the stream.

reimic commented 3 months ago

Maybe we can have both? I'd recommend going the HTTP/1.0 route for now.

There is also a small headers issue, where the content-length might not be there and cause a Undefined array key error. That's quite easy to fix. So HTTP -> 1.0 and headers in the first PR.

Doing this first will enable me to finish the long-awaited E2E PR. For @MayPaw - with running tests, you'll have an easier job testing the custom dechunker implementation. And once it is ready in can enhance the client in a separate PR.

MayPaw commented 3 months ago

I agree. We can go with HTTP/1.0 first, then @reimic can continue working on tests and I'll work on the dechunking mechanism.

adamziel commented 3 months ago

Sounds great, thank you!

reimic commented 2 months ago

With #106 merged this is no longer a blocker, but an enhancement issue.