Open lucacasonato opened 2 years ago
The list hasn't been updated since Sep 27, 2022, what's the status on this?
It's being actively worked on. @mmastrac can you update the current state?
This list is unfortunately a challenge to update -- I was working more on migrating the underlying resources to a common implementation to tackle the problem another way.
I checked off the req.body
in Deno.serve
implementation because that one is definitely done, however. I also have a patch in-flight for fetch
, but that one has not fully landed yet because of some node.js considerations.
@Anutrix Is there a specific slow path you'd like to track?
Functions within the Standard Library that rely on Reader
/Writer
interfaces are being deprecated. One must sacrifice a little performance by moving. However, the bridge is closing!
My use case is for Redis, or specifically x/r2d2, which I maintain. Here are some basic benchmarks that would indicate how my client would perform when using Reader
/Writer
interfaces vs. the Web Streams API: https://gist.github.com/iuioiua/f1b93f1b7e3c1a055ddf6fc60a6743a2
For my use cases, it'd be great if TextEncoderStream()
, TextDecoderStream()
and ReadableStream.pipeThrough()
were optimised.
@iuioiua Thanks for the information -- I think piping into a text encoder/text decoder could absolutely be faster.
This issue acts as a design doc for a set of performance optimizations and clean-ups that can be performed in Deno to make piping data between consumers and producers of data significantly faster than we have previously been able to, while preserving the pristine APIs we expose to users for streaming data.
Introducing streams
Streams represent a combination of a single data source and a data sink. Streams are abstract - they do not exist as objects in JS. Streams are a logical "pipe" that is able to transfer data between between a source and a sink. On these streams the action of streaming is performed. Streaming is the operation of transferring data from the source to the sink.
Sources are the producing end of a stream. They are most often represented as
ReadableStream
objects in JS. There are two types of sources: JS backed, and resource backed. JavaScript backed sources are streams that are constructed manually in JS, where the chunks sent on the stream stem from user code. Resource backed sources are producers backed by a Rust side resource. AReadableStream
representing file handle contents (FsFile#readable
) is such afsFile.readable
(a source representing data read from a file handle)req.body
(a source representing the data received in an HTTP server request)resp.body
(a source representing the data received as the response to an outboundfetch
request)Sinks are consuming end of a stream. In JS they are sometimes represented as
WritableStream
, but more often than not as an abstract resource that consumes data. There are once again two types of sinks: JS backed, and resource backed. JS backed sinks read chunks from the stream in JS. Resource backed sinks consume streams in Rust - they usually send this data to some IO sink, like a network socket or open file.To make streams optimally fast, we need to optimize them specifically for the pair of source and sink they consist of. For example, a stream with a Rust backed sink & source, can perform the entire stream in Rust - no copies to JavaScript are necessary. If the reverse is true (both sides of the stream are JS backed), no copies to Rust are necessary.
Some sinks are also special: they want to read all data from the stream into a single combined chunk for example. In this case if the source can provide all data in a single operation more efficiently than streaming into chunks, cpu cycles and thus time can be saved resulting faster streams.
Streaming operation
To optimize streams what we really need to do is optimize the streaming operation. This means that every place we "perform a streaming operation" we need to decide if we can perform an optimized operation.
More concretely the baseline operation we are trying to optimize is the following:
ReadableStream
(source)This operation exists in various places. Some examples:
ReadableStream#pipeTo(writable)
ReadableStream#pipeThrough(transformer)
req.body
reading infetch
resp.body
reading in HTTP serverCurrently all of the above implementations of the operation (except for the HTTP server case) are completely un-optimized, and are implemented in pure JS. This means that if a stream consists of a rust backed sink and source (e.g file upload via fetch), all data needs to be copied once from Rust into JS, and then from JS back into Rust.
Optimizations
This is exactly where the core of the optimization lies. We can significantly improve performance by skipping the step where we copy data from / to JS.
We have some practical experience with the performance improvements this unlocks in the HTTP server. We have implemented an optimization there that allows some Rust backed sources to be directly used as a HTTP response, without having to get copied through JS. This significantly improves throughput.
The implementation
Deno uses WHATWG streams as the primitive for sources and sinks. Our implementation work will focus exclusively on these (
ReadableStream
andWritableStream
). These stream primitives have the benefit that they are able to be locked. This allows us to take full ownership of the underlying source / sink, allowing us to move the ownership from JS to Rust. This enables all following optimizations.To make the optimization work, we need to be able to effectively identify which sources and sinks are resource backed, and can have their streaming operation offloaded to Rust. To do this we need to brand all JS objects that represent a source or sink that is backed by a resource. Luckily this is easy, as the API for all source and sink resources is the same: it is the
deno_core::Resource
read & write API. To brand a source or sink object thus, we only need to attach a hidden[rid]
property to it. This property can then hold the resource ID backing the source or sink.For sources, this branding is already implemented for many streams through the
readableStreamForRid
helper function in__bootstrap.streams.readableStreamForRid
. We need to implement a similar helper function forWritableStream
sinks, probably calledwritableStreamForRid
. A non branding version of this helper already exists inext/net
.We then need to update all code paths that perform a streaming operation to go into a special case path if both source and sink are resource backed. The most generic case of this is
ReadableStream#pipeTo
andReadableStream#pipeThrough
. The latter is similar to the former, except that it has to make two connections rather than just one (src -> transformer sink & transfomer src -> sink). For these generic cases a new op would be introduced (op_pipe
) that takes a source and sink resource and performs a simple streaming operation in Rust. More specialized logic and op integration is required when piping thereq.body
intofetch
, orresp.body
into a HTTP response.Finally, more specialized streaming-like code paths can be optimized. A good example of this is the "read all" operation that is performed on
req.body
andresp.body
whenreq.arrayBuffer()
andresp.arrayBuffer()
are called. Here we can add an op that performs the entire read operation in Rust (op_read_all
), only passes a single aggregated chunk to JavaScript, rather than aggregating chunks manually in JS. @marcosc90 came up with this optimization in #16038. There are likely other specialized streaming-like operations that could use similar optimizations.FsFile#readable
Deno.stdin#readable
Deno.Process#stdout#readable
Deno.Process#stderr#readable
Deno.Child#stdout
(difficult, because unrefable)Deno.Child#stderr
(difficult, because unrefable)Conn#readable
resp.body
returned fromCache#match
req.body
inDeno.serveHttp
req.body
inDeno.serve
resp.body
returned fromfetch
CompressionStream#readable
WebSocketStream#readable
TextEncoderStream#readable
FsFile#writable
Deno.stdout
Deno.stderr
Deno.Process#stdin#writable
Deno.Child#stdin
Conn#writable
CompressionStream#writable
WebSocketStream#writable
TextDecoderStream#writable
resp.body
consumption inDeno.serveHttp
resp.body
consumption inDeno.serve
resp.body
consumption incache.put
req.body
consumption infetch
ReadableStream#pipeTo
ReadableStream#pipeThrough
req.body
orresp.body
(req.arrayBuffer()
for example)