awslabs / aws-c-s3

C99 library implementation for communicating with the S3 service, designed for maximizing throughput on high bandwidth EC2 instances.
Apache License 2.0
101 stars 41 forks source link

Use mmap to for parallel read streams #354

Closed TingDaoK closed 1 year ago

TingDaoK commented 1 year ago

Why mmap?

Is mmap really better?

fread Run:1 Secs:12.924 Gb/s:19.9 Mb/s:19940.2 GiB/s:2.3 MiB/s:2377.1 Run:2 Secs:9.986 Gb/s:25.8 Mb/s:25806.7 GiB/s:3.0 MiB/s:3076.4 Run:3 Secs:9.382 Gb/s:27.5 Mb/s:27466.3 GiB/s:3.2 MiB/s:3274.2 Run:4 Secs:10.024 Gb/s:25.7 Mb/s:25707.0 GiB/s:3.0 MiB/s:3064.5 Run:5 Secs:8.134 Gb/s:31.7 Mb/s:31683.2 GiB/s:3.7 MiB/s:3776.9 Run:6 Secs:9.643 Gb/s:26.7 Mb/s:26722.7 GiB/s:3.1 MiB/s:3185.6 Run:7 Secs:9.860 Gb/s:26.1 Mb/s:26136.3 GiB/s:3.0 MiB/s:3115.7 Run:8 Secs:9.008 Gb/s:28.6 Mb/s:28606.2 GiB/s:3.3 MiB/s:3410.1 Run:9 Secs:9.545 Gb/s:27.0 Mb/s:26998.6 GiB/s:3.1 MiB/s:3218.5 Run:10 Secs:8.926 Gb/s:28.9 Mb/s:28869.0 GiB/s:3.4 MiB/s:3441.5


- From tracing of the s3 tests, I saw the improvement from 7.5 secs to 6.8 secs, where our IO threads are doing intensive works. But, I would assume it's not really related to the difference between mmap and fread, as they are very likely not the bottle neck of the process.

How will mmap affect our memory usage?
- Needs to test and tracing. `mmap` supports to load the data on needed and we control how many data to read into memory. It "SHOULD" have no affect on our memory usage. However, I am not sure, I'll run some tracing and see the memory usage to make sure.
- Used `top` to track the memory usage, found that OS will load the data into memory and keep it until we unmap the memory. Fixed the memory issue by only map the needed part. The latest change has no effects on memory usage now.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
codecov-commenter commented 1 year ago

Codecov Report

Merging #354 (b1b0a98) into parallel-preparation (7630d1c) will increase coverage by 0.00%. The diff coverage is 89.09%.

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/awslabs/aws-c-s3/pull/354/graphs/tree.svg?width=650&height=150&src=pr&token=J4KP54FVLF&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs)](https://app.codecov.io/gh/awslabs/aws-c-s3/pull/354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs) ```diff @@ Coverage Diff @@ ## parallel-preparation #354 +/- ## ===================================================== Coverage 89.45% 89.45% ===================================================== Files 18 19 +1 Lines 5035 5073 +38 ===================================================== + Hits 4504 4538 +34 - Misses 531 535 +4 ``` | [Files](https://app.codecov.io/gh/awslabs/aws-c-s3/pull/354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs) | Coverage Δ | | |---|---|---| | [source/s3\_meta\_request.c](https://app.codecov.io/gh/awslabs/aws-c-s3/pull/354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs#diff-c291cmNlL3MzX21ldGFfcmVxdWVzdC5j) | `92.72% <100.00%> (+0.01%)` | :arrow_up: | | [source/s3\_parallel\_read\_stream.c](https://app.codecov.io/gh/awslabs/aws-c-s3/pull/354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs#diff-c291cmNlL3MzX3BhcmFsbGVsX3JlYWRfc3RyZWFtLmM=) | `83.15% <92.85%> (+0.50%)` | :arrow_up: | | [source/aws\_mmap.c](https://app.codecov.io/gh/awslabs/aws-c-s3/pull/354?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs#diff-c291cmNlL2F3c19tbWFwLmM=) | `87.17% <87.17%> (ø)` | |