awslabs / aws-s3-transfer-manager-rs

Apache License 2.0
5 stars 0 forks source link

Improve Download Memory Usage #61

Closed waahm7 closed 1 day ago

waahm7 commented 5 days ago

Description of changes: Tokio.spawn doesn't respect the spawn order, which can result in us downloading the first num_concurrency parts in random order. For a workload of 5GB * 100 files, this can lead to very high memory usage, as seen in the diagram below. This PR refactors the exact part to be determined only once the task has been scheduled.

Uploads can also have a similar issue where we read too many parts into memory. To fix that, we will need to refactor our scheduler to be smarter so that we only read the part when we have the permit. (Created: https://github.com/awslabs/aws-s3-transfer-manager-rs/issues/60)

memory_usage_comparison

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

waahm7 commented 2 days ago

What's this do to throughput for download? In particular the warmup/first few runs?

It didn’t help much since we were only doing 125 out of 3840 parts out of order.