awslabs / aws-c-s3

C99 library implementation for communicating with the S3 service, designed for maximizing throughput on high bandwidth EC2 instances.
Apache License 2.0
93 stars 37 forks source link

Stop limiting num-connections based on num-known-IPs (Improve S3-Express performance) #407

Closed graebm closed 6 months ago

graebm commented 6 months ago

Issue: Disappointing S3-Express performance.

Description of changes: Stop limiting num-connections based on num-known-IPs.

Diagnosing the issue: We found that num-connections was never getting very high, because num-connections scales based on the num-known-IPs. S3-Express endpoints have very few IPs, so their num-connections weren't scaling very high.

The algorithm was adding 10 connections per known-IP. On a 100Gb/s machine, this maxed out at 250 connections once 25 IPs were known. But S3-Express endpoints only have 4 unique IPs, so they never got higher than 40 connections.

This algorithm was written back when S3 returned 1 IP per DNS query. The intention was to throttle connections until more IPs were known, in order to spread load among S3's server fleet. However, as of Aug 2023 S3 provides multiple IPs per DNS query. So now, we can scale up to max connections after the first DNS query and still be spreading load.

We also believed that spreading load was a key to good performance. But I found that spreading the load didn't have much impact on performance (at least now, in 2024, on the 100Gb/s machine I was using). Tests where I hard-coded a single IP and hit it with max-connections didn't differ much from tests where the load was spread among 8 IPs or 100 IPs.

I want to get this change out quickly and help S3-Express, so I picked magic numbers where the num-connections math ends up with the same result as the old algorithm. Normal S3 performance is mildly improved (max-connections is reached immediately, instead of scaling up over 30sec as it finds more IPs). S3 Express performance is MUCH improved.

Future Work: Improve this algorithm further:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

codecov-commenter commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 89.37%. Comparing base (593c2ab) to head (e677691).

Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/awslabs/aws-c-s3/pull/407/graphs/tree.svg?width=650&height=150&src=pr&token=J4KP54FVLF&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs)](https://app.codecov.io/gh/awslabs/aws-c-s3/pull/407?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs) ```diff @@ Coverage Diff @@ ## main #407 +/- ## ======================================= Coverage 89.37% 89.37% ======================================= Files 20 20 Lines 5863 5854 -9 ======================================= - Hits 5240 5232 -8 + Misses 623 622 -1 ``` | [Files](https://app.codecov.io/gh/awslabs/aws-c-s3/pull/407?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs) | Coverage Δ | | |---|---|---| | [source/s3\_client.c](https://app.codecov.io/gh/awslabs/aws-c-s3/pull/407?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs#diff-c291cmNlL3MzX2NsaWVudC5j) | `88.16% <100.00%> (-0.11%)` | :arrow_down: | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/awslabs/aws-c-s3/pull/407/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=awslabs)