Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
322 stars 201 forks source link

switch to streaming+synchronous transcript-span compression #8739

Open warner opened 7 months ago

warner commented 7 months ago

What is the Problem Being Solved?

As discussed in https://github.com/Agoric/agoric-sdk/pull/8693#discussion_r1447654305 , we're going to land transcript span compression with an approach that accumulates all the uncompressed data in RAM at the same time, them compresses the whole span as a unit.

This uses more RAM than we'd really like. The difference isn't likely to matter too much, and we didn't have a convenient way to do better at the time, but if we can find a decent streaming-oriented synchronous gzip library, we'd like to switch to it.

Imagine that we have 200 items in the transcript span, and they use 1MB each. During rolloverTranscriptSpan (really in compressTranscriptSpan), the all-at-once approach will read 200 strings from the DB (memory usage: 200MB), do a .join('\n') on them (memory usage: 400MB), then feed the joined string into gzip() to get the uncompressed string (memory usage 420MB, assuming a 10:1 compression ratio). Then we write the compressed string into the DB, and drop everything (memory usage 0MB). Peak memory usage is 420MB.

Description of the Design

With a streaming compressor, we create the compression object first, then start reading items from the DB. We read a 1MB string, write it into the compressor (which compresses most of it and maybe holds on to some tail, depending upon how much look-behind the zlib mode wants), read out all the available compressed data and append it to an array, yielding a peak memory usage of 1MB + 1/10 = 1.1MB. Then we drop that first item (memory usage 0.1MB). We read the second item and write it in (memory usage 1MB + 2/10 = 1.2MB), then drop the item (usage 0.2MB). We repeat this way until we've fed every item into the compressor (usage 1MB + 200/10 = 21MB, then 20MB). We close out the compressor and get the last few bytes, now we've got an array with 20MB of pieces. We concat the array pieces to get a single blob (peak usage 40MB), drop the original array (usage 20MB), write it into the DB, and drop the blob (0MB). Peak usage is 40MB.

To reduce it further (at the expense of more allocations), we grow the output buffer as we read more data from the compressor. Depending upon how lucky the allocator gets, the peak usage might be less than 40MB (to avoid an O(N^2) problem, the grow-a-buffer code usually doubles the size of the buffer each time you append past the current capacity, to make it closer to O(N*logN), but if the last grow+copy happens near the end of the process, you're briefly allocating the same double-the-blob size as the simple concat-at-the-end). Doesn't seem likely to help, and is certainly slower.

On the decompression side, we'd do the same thing:

I know the zlib library is capable of being used this way, I've done it from other languages. The core function basically takes a pointer to input data and either returns some output data (incrementing the input pointer) or a special "I need more input" marker. The Node.js Stream-based wrappers make this a lot easier to use, but also entrain async stuff that we don't want.

Security Considerations

none, this is internal to the swing-store

Scaling Considerations

Should reduce peak memory usage

Test Plan

The existing unit tests ought to cover it, however if we're writing our own library to do the sync-streaming compress/decompress work, it must have thorough unit tests. There are non-trivial cases to exercise, like when a small amount of additional input does not yield any new output, or when an input step consumes exactly the last byte of the file. Hopefully we'll find an upstream library that does this and is tested properly, so we won't have to roll our own.

Upgrade Considerations

none, this is internal to swing-store, and uses the same API and data format as the #8318 / #8693 approach, only the implemention would change

FUDCo commented 7 months ago

We concat the array pieces to get a single blob (peak usage 40MB), drop the original array (usage 20MB), write it into the DB, and drop the blob (0MB). Peak usage is 40MB.

I think this analysis is wrong, though in a good way. Assuming a proper streaming compression scheme (which is the prerequisite for this anyway), we don't concat an array of compressed pieces, we compress each item in turn onto the end of a growing blob. Peak usage would be 20MB + whatever buffer slosh is needed to manage a dynamically growing blob.