Dumping was heavily CPU bound in the edgedb CLI and taking ~32s on my
machine for a 1GB dump. Profiling revealed that we were spending
almost all of our time in memset. In particular, on every call to
read_buf, tls_api was zeroing the entire target buffer.
This is bad, because it means that we do an amount of zeroing
quadratic in the number of calls it takes to fill the buffer. For our
10MB dump messages and read_buf typically returning 16KB, this was 640
calls per message.
(But it turns out that even if you pass an initialized ReadBuf to
read_buf, it also loses track of the initialization status. Only
the underlying poll_read seems to work?)
Work around this problem by capping the size of the reads we do
Dumping was heavily CPU bound in the edgedb CLI and taking ~32s on my machine for a 1GB dump. Profiling revealed that we were spending almost all of our time in memset. In particular, on every call to read_buf, tls_api was zeroing the entire target buffer.
This is bad, because it means that we do an amount of zeroing quadratic in the number of calls it takes to fill the buffer. For our 10MB dump messages and read_buf typically returning 16KB, this was 640 calls per message.
tls_api did this zeroing because it called
initialized_unfilled
( https://github.com/stepancheg/rust-tls-api/blob/6337f77db1cb6dfb53bbe7f9ec6b0d258cf8224b/api/src/async_as_sync.rs#L198) on its inputReadBuf
, and the wrapper that constructs aReadBuf
from aBytesMut
always reports the whole buffer as uninitialized.(But it turns out that even if you pass an initialized ReadBuf to
read_buf
, it also loses track of the initialization status. Only the underlying poll_read seems to work?)Work around this problem by capping the size of the reads we do