aws / aws-sdk-cpp

AWS SDK for C++
Apache License 2.0
1.97k stars 1.06k forks source link

curl-based request does not maintain HTTP keepalive after 4xx response #1984

Open sharky5102 opened 2 years ago

sharky5102 commented 2 years ago

Describe the bug

I have noticed that the AWS CPP SDK does not maintain HTTP keepalive with services when the service responds with a HTTP 4xx or 5xx status code - the connection is closed directly after receive such a response. The result is a high volume of HTTP reconnects and, in my case, high CPU usage due to excessive loading of /etc/pki/ssl/certs/ca-bundle.crt for each new connection (and the abundance of HTTP 400 errors in DynamoDB for ConditionalCheckFailed exceptions)

The root cause is behavior from libcurl, in which some code determines that 'upload was incomplete' after receiving an HTTP error code which is 300 or above (see https://github.com/curl/curl/blob/master/lib/http.c#L4128 ). The reason for this is that the AWS SDK CPP is providing a CURLOPT_READFUNCTION but not a CURLOPT_POSTFIELDSIZE. Normally this would result in a chunked upload, but CurlHttpClient.cpp is overriding this by providing a Content-length and Transfer-Encoding header explicitly. As a result, libcurl is unable to determine if the entire request body has been sent after receiving the response. Disputably this is a bug in libcurl, but the fix in the SDK seems pretty easy.

On my system the fix caused significant CPU usage improvements (like, 50%) due to not having to re-load the cert bundle for each request but it is heavily dependent on the number of 4xx's received.

Expected Behavior

HTTP keepalive should work when we receive a 400

Current Behavior

HTTP connection is dropped after any 3xx, 4xx or 5xx response from a service

Reproduction Steps

The easiest way to show this is by modeling the interaction with libcurl. The minimal code for libcurl to show the reconnection, modeled after the interaction that CurlHttpClient has with libcurl is:

#include <stdio.h>
#include <curl/curl.h>

size_t read_callback(char *ptr, size_t size, size_t nmemb, void *userdata)
{
  return fread(ptr, size, nmemb, (FILE *)userdata);
}

int main(void)
{
  CURL *curl;
  CURLcode res;

  curl = curl_easy_init();
  struct curl_slist *list = NULL;

  if(curl) {
    FILE *file = fopen("small", "rb");
    fseek(file, 0, 2);
    size_t size = ftell(file);
    fseek(file, 0, 0);
    char buf[1024];

    sprintf(buf, "Content-length: %d", size);

    list = curl_slist_append(list, buf);
    list = curl_slist_append(list, "Expect:");
    list = curl_slist_append(list, "transfer-encoding:");

    curl_easy_setopt(curl, CURLOPT_URL, "https://dynamodb.us-east-1.amazonaws.com");
    curl_easy_setopt(curl, CURLOPT_POST, 1L);
    curl_easy_setopt(curl, CURLOPT_READFUNCTION, read_callback);
    curl_easy_setopt(curl, CURLOPT_READDATA, (void *)file);
    curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
    curl_easy_setopt(curl, CURLOPT_HTTPHEADER, list);

    // ------------------------------------------ UNCOMMENT TO FIX
    //    curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, size);

    // FIST REQUEST
    res = curl_easy_perform(curl);
    if(res != CURLE_OK)
      fprintf(stderr, "curl_easy_perform() failed: %s\n",
              curl_easy_strerror(res));

    // SECOND REQUEST
    fseek(file, 0, 0);

    res = curl_easy_perform(curl);
    if(res != CURLE_OK)
      fprintf(stderr, "curl_easy_perform() failed: %s\n",
              curl_easy_strerror(res));

    curl_easy_cleanup(curl);
  }
  return 0;
}

Just compile with g++ -o curltest curltest.cpp -lcurl and create a file called small with some random content for it to attempt to upload.

This test makes two requests to dynamodb which are clearly incorrect, resulting in a 404 from dynamodb. What should happen, is that the second request re-uses the connection from the first, but it does not (see output below), unless the "UNCOMMENT TO FIX" line is uncommented.

Without the fix, this will output:

...
* HTTP error before end of send, stop sending
<
* Closing connection 0
...
* Hostname dynamodb.us-east-1.amazonaws.com was found in DNS cache
*   Trying 52.119.226.214:443...
* Connected to dynamodb.us-east-1.amazonaws.com (52.119.226.214) port 443 (#1)
...

uncommenting the fix produces:

...
* We are completely uploaded and fine
...
* Connection #0 to host dynamodb.us-east-1.amazonaws.com left intact
* Found bundle for host dynamodb.us-east-1.amazonaws.com: 0x7073d0 [serially]
* Can not multiplex, even if we wanted to!
* Re-using existing connection! (#0) with host dynamodb.us-east-1.amazonaws.com
* Connected to dynamodb.us-east-1.amazonaws.com (52.119.226.80) port 443 (#0)

Possible Solution

diff --git a/aws-cpp-sdk-core/source/http/curl/CurlHttpClient.cpp b/aws-cpp-sdk-core/source/http/curl/CurlHttpClient.cpp
index 5e6c8fc..8a06bca 100644
--- a/aws-cpp-sdk-core/source/http/curl/CurlHttpClient.cpp
+++ b/aws-cpp-sdk-core/source/http/curl/CurlHttpClient.cpp
@@ -653,6 +653,7 @@ std::shared_ptr<HttpResponse> CurlHttpClient::MakeRequest(const std::shared_ptr<
             curl_easy_setopt(connectionHandle, CURLOPT_READDATA, &readContext);
             curl_easy_setopt(connectionHandle, CURLOPT_SEEKFUNCTION, SeekBody);
             curl_easy_setopt(connectionHandle, CURLOPT_SEEKDATA, &readContext);
+            curl_easy_setopt(connectionHandle, CURLOPT_POSTFIELDSIZE, request->GetSize());
             if (request->IsEventStreamRequest())
             {
                 curl_easy_setopt(connectionHandle, CURLOPT_NOPROGRESS, 0L);

Additional Information/Context

Currently I'm using a different workaround, which is setting CURLOPT_KEEP_SENDING_ON_ERROR which also allows the connection to stay open. This is settable without modifying the AWS SDK CPP by setting this curl flag manually, but that seems like a bit of a hack.

AWS CPP SDK version used

I used 1.8 but the same code is in 1.9

Compiler and Version used

gcc

Operating System and version

linux AL2

jmklix commented 6 months ago

This is currently behaving as expected, but it is a feature request that we might add at some point in the future