Fast streams - Githubissues

lucacasonato commented 2 years ago

This issue acts as a design doc for a set of performance optimizations and clean-ups that can be performed in Deno to make piping data between consumers and producers of data significantly faster than we have previously been able to, while preserving the pristine APIs we expose to users for streaming data.

Introducing streams

Streams represent a combination of a single data source and a data sink. Streams are abstract - they do not exist as objects in JS. Streams are a logical "pipe" that is able to transfer data between between a source and a sink. On these streams the action of streaming is performed. Streaming is the operation of transferring data from the source to the sink.

Sources are the producing end of a stream. They are most often represented as ReadableStream objects in JS. There are two types of sources: JS backed, and resource backed. JavaScript backed sources are streams that are constructed manually in JS, where the chunks sent on the stream stem from user code. Resource backed sources are producers backed by a Rust side resource. A ReadableStream representing file handle contents (FsFile#readable) is such a

fsFile.readable (a source representing data read from a file handle)
req.body (a source representing the data received in an HTTP server request)
resp.body (a source representing the data received as the response to an outbound fetch request)

Sinks are consuming end of a stream. In JS they are sometimes represented as WritableStream, but more often than not as an abstract resource that consumes data. There are once again two types of sinks: JS backed, and resource backed. JS backed sinks read chunks from the stream in JS. Resource backed sinks consume streams in Rust - they usually send this data to some IO sink, like a network socket or open file.

To make streams optimally fast, we need to optimize them specifically for the pair of source and sink they consist of. For example, a stream with a Rust backed sink & source, can perform the entire stream in Rust - no copies to JavaScript are necessary. If the reverse is true (both sides of the stream are JS backed), no copies to Rust are necessary.

Some sinks are also special: they want to read all data from the stream into a single combined chunk for example. In this case if the source can provide all data in a single operation more efficiently than streaming into chunks, cpu cycles and thus time can be saved resulting faster streams.

Streaming operation

To optimize streams what we really need to do is optimize the streaming operation. This means that every place we "perform a streaming operation" we need to decide if we can perform an optimized operation.

More concretely the baseline operation we are trying to optimize is the following:

get a reader from the ReadableStream (source)
while the reader has more chunks:
1. read a chunk from the reader
2. emit the chunk into the underlying sink
unlock the stream by closing the reader

This operation exists in various places. Some examples:

ReadableStream#pipeTo(writable)
ReadableStream#pipeThrough(transformer)
req.body reading in fetch
resp.body reading in HTTP server

Currently all of the above implementations of the operation (except for the HTTP server case) are completely un-optimized, and are implemented in pure JS. This means that if a stream consists of a rust backed sink and source (e.g file upload via fetch), all data needs to be copied once from Rust into JS, and then from JS back into Rust.

Optimizations

This is exactly where the core of the optimization lies. We can significantly improve performance by skipping the step where we copy data from / to JS.

We have some practical experience with the performance improvements this unlocks in the HTTP server. We have implemented an optimization there that allows some Rust backed sources to be directly used as a HTTP response, without having to get copied through JS. This significantly improves throughput.

The implementation

Deno uses WHATWG streams as the primitive for sources and sinks. Our implementation work will focus exclusively on these (ReadableStream and WritableStream). These stream primitives have the benefit that they are able to be locked. This allows us to take full ownership of the underlying source / sink, allowing us to move the ownership from JS to Rust. This enables all following optimizations.

To make the optimization work, we need to be able to effectively identify which sources and sinks are resource backed, and can have their streaming operation offloaded to Rust. To do this we need to brand all JS objects that represent a source or sink that is backed by a resource. Luckily this is easy, as the API for all source and sink resources is the same: it is the deno_core::Resource read & write API. To brand a source or sink object thus, we only need to attach a hidden [rid] property to it. This property can then hold the resource ID backing the source or sink.

For sources, this branding is already implemented for many streams through the readableStreamForRid helper function in __bootstrap.streams.readableStreamForRid. We need to implement a similar helper function for WritableStream sinks, probably called writableStreamForRid. A non branding version of this helper already exists in ext/net.

We then need to update all code paths that perform a streaming operation to go into a special case path if both source and sink are resource backed. The most generic case of this is ReadableStream#pipeTo and ReadableStream#pipeThrough. The latter is similar to the former, except that it has to make two connections rather than just one (src -> transformer sink & transfomer src -> sink). For these generic cases a new op would be introduced (op_pipe) that takes a source and sink resource and performs a simple streaming operation in Rust. More specialized logic and op integration is required when piping the req.body into fetch, or resp.body into a HTTP response.

Finally, more specialized streaming-like code paths can be optimized. A good example of this is the "read all" operation that is performed on req.body and resp.body when req.arrayBuffer() and resp.arrayBuffer() are called. Here we can add an op that performs the entire read operation in Rust (op_read_all), only passes a single aggregated chunk to JavaScript, rather than aggregating chunks manually in JS. @marcosc90 came up with this optimization in #16038. There are likely other specialized streaming-like operations that could use similar optimizations.

Branding sources
- [x] FsFile#readable
- [x] Deno.stdin#readable
- [x] Deno.Process#stdout#readable
- [x] Deno.Process#stderr#readable
- [ ] Deno.Child#stdout (difficult, because unrefable)
- [ ] Deno.Child#stderr (difficult, because unrefable)
- [x] Conn#readable
- [x] resp.body returned from Cache#match
- [ ] req.body in Deno.serveHttp
- [x] req.body in Deno.serve
- [x] resp.body returned from fetch
- [ ] CompressionStream#readable
- [ ] WebSocketStream#readable
- [ ] TextEncoderStream#readable
Branding sinks
- [ ] FsFile#writable
- [ ] Deno.stdout
- [ ] Deno.stderr
- [ ] Deno.Process#stdin#writable
- [ ] Deno.Child#stdin
- [ ] Conn#writable
- [ ] CompressionStream#writable
- [ ] WebSocketStream#writable
- [ ] TextDecoderStream#writable
Optimize streaming operations
- [x] resp.body consumption in Deno.serveHttp
- [x] resp.body consumption in Deno.serve
- [ ] resp.body consumption in cache.put
- [ ] req.body consumption in fetch
- [ ] ReadableStream#pipeTo
- [ ] ReadableStream#pipeThrough
Optimize streaming-like operations
- [x] Collecting a req.body or resp.body (req.arrayBuffer() for example)

Anutrix commented 1 year ago

The list hasn't been updated since Sep 27, 2022, what's the status on this?

bartlomieju commented 1 year ago

It's being actively worked on. @mmastrac can you update the current state?

mmastrac commented 1 year ago

This list is unfortunately a challenge to update -- I was working more on migrating the underlying resources to a common implementation to tackle the problem another way.

I checked off the req.body in Deno.serve implementation because that one is definitely done, however. I also have a patch in-flight for fetch, but that one has not fully landed yet because of some node.js considerations.

@Anutrix Is there a specific slow path you'd like to track?

iuioiua commented 1 year ago

Functions within the Standard Library that rely on Reader/Writer interfaces are being deprecated. One must sacrifice a little performance by moving. However, the bridge is closing!

My use case is for Redis, or specifically x/r2d2, which I maintain. Here are some basic benchmarks that would indicate how my client would perform when using Reader/Writer interfaces vs. the Web Streams API: https://gist.github.com/iuioiua/f1b93f1b7e3c1a055ddf6fc60a6743a2

For my use cases, it'd be great if TextEncoderStream(), TextDecoderStream() and ReadableStream.pipeThrough() were optimised.

mmastrac commented 1 year ago

@iuioiua Thanks for the information -- I think piping into a text encoder/text decoder could absolutely be faster.

denoland / deno

Fast streams #16046

Introducing streams

Streaming operation

Optimizations

The implementation