awslabs / aws-sdk-rust

AWS SDK for the Rust Programming Language
https://awslabs.github.io/aws-sdk-rust/
Apache License 2.0
3.01k stars 249 forks source link

Make SO_NODELAY configurable (or default to true) as it massively reduces Lambda invocation latency #958

Open jackkleeman opened 11 months ago

jackkleeman commented 11 months ago

Describe the feature

The h2 library that backs this SDK seems to put HEADERS and DATA frames into separate TCP packets when payloads are large. That's fine, but in combination with Nagle's algorithm, it means that the second packet containing the DATA frame will not leave the client machine until an ACK is received for the packet containing the HEADERS frame. In my tests this appears to add substantial latency to Lambda invocations. Perhaps on the AWS side there is tcp delayed acknowledgement going on?

This can be resolved by setting SO_NODELAY (this can be set via a method on the hyper HttpConnector). In general this flag reduces latency at the expense of potentially more packets, but in this case it seems to make no difference to the number of packets, but massively reduces latency. In Go this flag defaults to true.

Benchmarks against a hello world Lambda

delays/delay            time:   [90.039 ms 91.792 ms 93.627 ms]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  2 (2.00%) high mild
  4 (4.00%) high severe
delays/nodelay          time:   [33.833 ms 34.452 ms 35.147 ms]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Use Case

Reducing latency for AWS API calls

Proposed Solution

Either a new config variable, or simply default it to true.

Other Information

As a workaround, you can do this:

static HTTPS_NATIVE_ROOTS: Lazy<HttpsConnector<HttpConnector>> = Lazy::new(|| {
    let mut http = HttpConnector::new();
    // HttpConnector won't enforce scheme, but HttpsConnector will
    http.enforce_http(false);
    // Set SO_NODELAY, which we have found significantly improves Lambda invocation latency
    http.set_nodelay(true);
    hyper_rustls::HttpsConnectorBuilder::new()
        .with_tls_config(
            rustls::ClientConfig::builder()
                .with_cipher_suites(&[
                    // TLS1.3 suites
                    rustls::cipher_suite::TLS13_AES_256_GCM_SHA384,
                    rustls::cipher_suite::TLS13_AES_128_GCM_SHA256,
                    // TLS1.2 suites
                    rustls::cipher_suite::TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
                    rustls::cipher_suite::TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
                    rustls::cipher_suite::TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
                    rustls::cipher_suite::TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
                    rustls::cipher_suite::TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,
                ])
                .with_safe_default_kx_groups()
                .with_safe_default_protocol_versions()
                .expect("Error with the TLS configuration. Please file a bug report under https://github.com/restatedev/restate/issues.")
                .with_native_roots()
                .with_no_client_auth()
        )
        .https_or_http()
        .enable_http1()
        .enable_http2()
        .wrap_connector(http)
});

let mut config = aws_config::defaults(BehaviorVersion::latest());
config = config.http_client(HyperClientBuilder::new().build(HTTPS_NATIVE_ROOTS.clone()));

Acknowledgements

A note for the community

Community Note

Velfi commented 10 months ago

Hey @jackkleeman, thanks for submitting this issue. We'll add it to our backlog.

I looked at this and came up with a few findings:

Currently, adding support won't be simple because (afaict) we don't have easy access to the hyper::client::connect::HttpConnector used in our TLS connector meaning we can't just call HttpConnector::set_nodelay.

One last thing to note: in hyper v1.0, HttpConnector got moved to hyper-util.

jackkleeman commented 10 months ago

Yes, given that the Https connector is currently in a lazy static and then just cloned, the only easy thing to do here is to change the default, which I guess is a big decision

As a workaround, users can produce their own https connector by copying and pasting the defaults used by aws, and provide it at config time. Thats how I have fixed it on my end

rcoh commented 10 months ago

if you have a snippet could you paste it on this issue? That will be very helpful for other folks looking for the same behavior

jackkleeman commented 10 months ago

Done @rcoh