GitHub target: reduce memory consumption by streaming uploads/downloads

rhcarvalho commented 3 years ago

The GitHub target should never need to hold full release assets (nor any other potentially large files) in memory.

This issue tracks replacing code that reads full files from disk, downloads arbitrary files from the internet and calculate checksums. All of those steps can be performed with fixed size buffers, consuming less resources and eliminating a failure mode of running out of memory.

See discussion in https://github.com/getsentry/craft/pull/328#discussion_r752531205.

Note: this might require changes to how we use the Octokit library.

Some of the candidates

Reading full files of arbitrary size to memory where we could possibly be stream uploading / calculating checksum:

https://github.com/getsentry/craft/blob/244efd8702223af402f895284789bb4249c0905e/src/targets/github.ts#L365

Calculating MD5 hashes from in-memory bytes where we could be updating the checksum chunk-by-chunk:

https://github.com/getsentry/craft/blob/244efd8702223af402f895284789bb4249c0905e/src/targets/github.ts#L451-L453

Downloading arbitrarily large assets from the Internet (into response.data) where we could read chunks and update a checksum:

https://github.com/getsentry/craft/blob/244efd8702223af402f895284789bb4249c0905e/src/targets/github.ts#L438

rhcarvalho commented 3 years ago

I've put down in the issue description some of the lines of code that I've recently noticed implicitly assuming small release assets.

A more detailed review of the GitHub target might reveal more places where the assumption is made.

A similar problem might exist in other targets.

As far as I know, this problem has not manifested yet in any release of our projects.

rhcarvalho commented 3 years ago

getsentry / craft

GitHub target: reduce memory consumption by streaming uploads/downloads #329

Some of the candidates