Azure / durabletask

Durable Task Framework allows users to write long running persistent workflows in C# using the async/await capabilities.
Apache License 2.0
1.51k stars 290 forks source link

Merge with Main and Remove SimpleBufferManager #947

Closed wsugarman closed 1 year ago

wsugarman commented 1 year ago

This PR merges with the main branch some changes that have been added since the last time. It also removes the existing SimpleBufferManager and replaces it with a more standard call pattern.

See these commits for more details around what is actually changed.

wsugarman commented 1 year ago

I'm good with these changes.

Just so that I fully understand the implications, does this new method of compressing and decompressing remove the need for SimpleBufferManager or are we just removing it because it might be causing more problems than it solves? I guess the answer to that depends on what blob.CopyToAsync is doing internally? I don't mind either way, but curious to know if this changes the memory usage profile in any way.

I mostly removed it to simplify the logic, but it does change the allocations a bit. However, I think we're pretty well off at the end of the day. For example in CompressAndUploadAsBytesAsync, previously we would allocate an array (as part of MemoryStream) to store the entire compressed message blob while using a smaller pooled buffer to incrementally pass data from the source Stream to the GZipStream. However, now we stream the compressed bytes from the source Stream directly to the GZipStream that is wrapping a network stream (technically a BlockBlobWriteStream). Internally. Inside of the MemoryStream.CopyToAsync, we're really just calling GzipStream.WriteAsync for the entire MemoryStream buffer.

Likewise for DownloadAndDecompressAsBytesAsync, previously we would allocate a large array via MemoryStream to hold the contents of the blob and incrementally copy decompressed bytes from the GZipStream wrapping the blob's stream to the destination MemoryStream using a small pooled buffer. Then that array would be converted into a UTF-8 string. However, now we skip the middle-man and directly use StreamReader.ReadToEndAsync to write the bytes into a StringBuilder and in turn a string. Internally, this method does something similar to the logic before by incrementally filling up its own buffer (whose size can be specified in the StreamReader ctor) and writing the bytes to the StringBuilder. While StringBuilder looks to be able to quickly create the string with an efficient memmove, it still is copying the bytes.

So, in actuality, DownloadAndDecompressAsBytesAsync is probably less memory efficient as it no longer pools the intermediate buffer -- although it may be slightly faster. But we do have big memory efficiency gains in CompressAndUploadAsBytesAsync by streaming the entire operation.

If we really wanted to keep some of the pooling, I would recommend using Microsoft.IO.RecyclableMemoryStream to replace the SimpleBufferManager.