Description of changes:
Stop limiting num-connections based on num-known-IPs.
Diagnosing the issue:
We found that num-connections was never getting very high, because num-connections scales based on the num-known-IPs. S3-Express endpoints have very few IPs, so their num-connections weren't scaling very high.
The algorithm was adding 10 connections per known-IP. On a 100Gb/s machine, this maxed out at 250 connections once 25 IPs were known. But S3-Express endpoints only have 4 unique IPs, so they never got higher than 40 connections.
This algorithm was written back when S3 returned 1 IP per DNS query. The intention was to throttle connections until more IPs were known, in order to spread load among S3's server fleet. However, as of Aug 2023 S3 provides multiple IPs per DNS query. So now, we can scale up to max connections after the first DNS query and still be spreading load.
We also believed that spreading load was a key to good performance. But I found that spreading the load didn't have much impact on performance (at least now, in 2024, on the 100Gb/s machine I was using). Tests where I hard-coded a single IP and hit it with max-connections didn't differ much from tests where the load was spread among 8 IPs or 100 IPs.
I want to get this change out quickly and help S3-Express, so I picked magic numbers where the num-connections math ends up with the same result as the old algorithm. Normal S3 performance is mildly improved (max-connections is reached immediately, instead of scaling up over 30sec as it finds more IPs). S3 Express performance is MUCH improved.
Future Work:
Improve this algorithm further:
expect higher throughput on connections to S3 Express
expect lower throughput on connections transferring small objects
dynamic scaling without a bunch of magic numbers ??? (sounds cool, but I don't have any ideas how this would work yet)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Issue: Disappointing S3-Express performance.
Description of changes: Stop limiting num-connections based on num-known-IPs.
Diagnosing the issue: We found that num-connections was never getting very high, because num-connections scales based on the num-known-IPs. S3-Express endpoints have very few IPs, so their num-connections weren't scaling very high.
The algorithm was adding 10 connections per known-IP. On a 100Gb/s machine, this maxed out at 250 connections once 25 IPs were known. But S3-Express endpoints only have 4 unique IPs, so they never got higher than 40 connections.
This algorithm was written back when S3 returned 1 IP per DNS query. The intention was to throttle connections until more IPs were known, in order to spread load among S3's server fleet. However, as of Aug 2023 S3 provides multiple IPs per DNS query. So now, we can scale up to max connections after the first DNS query and still be spreading load.
We also believed that spreading load was a key to good performance. But I found that spreading the load didn't have much impact on performance (at least now, in 2024, on the 100Gb/s machine I was using). Tests where I hard-coded a single IP and hit it with max-connections didn't differ much from tests where the load was spread among 8 IPs or 100 IPs.
I want to get this change out quickly and help S3-Express, so I picked magic numbers where the num-connections math ends up with the same result as the old algorithm. Normal S3 performance is mildly improved (max-connections is reached immediately, instead of scaling up over 30sec as it finds more IPs). S3 Express performance is MUCH improved.
Future Work: Improve this algorithm further:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.