S3 crt client timeouts and retry strategy are not working.

Manjunathagopi commented 5 months ago

Describe the bug

Below mentioned s3 crt client configs we set.

Aws::S3Crt::S3CrtClient *s3_crt_client;
Aws::S3Crt::Model::GetObjectRequest object_request;
Aws::SDKOptions options;
Aws::InitAPI(options);
Aws::S3Crt::ClientConfiguration config;
config.throughputTargetGbps = 1;
config.partSize = 10*1024*1024;
config.httpRequestTimeoutMs = 100;
config.connectTimeoutMs = 100;
config.requestTimeoutMs = 100;  
s3_crt_client = Aws::New<Aws::S3Crt::S3CrtClient>("test", config);

We are reading 25MB at a time and we are measuring how much time it is taking for getObject. average of 500ms taken for each getObject Even though we set all the timeouts s3 crt client is taking more time? We even tried with no retry strategy, but still there is no use and the behaviour is same.

  config.httpRequestTimeoutMs = 100;
  config.connectTimeoutMs = 100;
  config.requestTimeoutMs = 100;  
  std::shared_ptr<Aws::Client::RetryStrategy> retry;
  retry.reset(Aws::New<Aws::Client::DefaultRetryStrategy>("test", 0, 25));
  config.retryStrategy = retry;

GetObject() is not exiting even timeout is reached. One time we saw GetObject() for size 25MB took 10seconds. Basically behavior is same even if we didn't set timeouts or assign custom retry strategy. we are running this in a c5a4x large instance which is running in ap-south-1 region and s3 bucket is also in the same region and account. Please tell whats wrong with this and suggest how to reduce time taken for GetObject().

Expected Behavior

With zero retries GetObject should exit once timeout is reached.

Current Behavior

With zero retries GetObject is not exiting even timeout is reached.

Reproduction Steps

zero retries

#include <fcntl.h>
#include <unistd.h>

#include <aws/core/Aws.h>
#include <aws/core/client/DefaultRetryStrategy.h>
#include <aws/s3-crt/S3CrtClient.h>
#include <aws/s3-crt/model/HeadObjectRequest.h>
#include <aws/s3-crt/model/GetObjectRequest.h>
#include <aws/s3-crt/S3CrtErrors.h>

void usage(const char *bin)
{
    printf("USAGE: %s <s3_uri> <part_size(in bytes)> <out_file(optional)>\n", bin);
}

uint64_t get_curr_time_ms(void)
{
    struct  timespec   ts;
    uint64_t           ts_64;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    ts_64 = (uint64_t)((ts.tv_sec * 1000) + (ts.tv_nsec/1000000));
    return ts_64;
}

int main (int argc, char* argv[])
{
    char *s3_uri;
    uint64_t read_size = UINT64_MAX;
    uint64_t part_size;
    char *buf = NULL;
    uint64_t read_left;
    uint64_t to_read;
    uint64_t ret;
    uint64_t iter = 0;
    char *out_file = NULL;
    int out_fd = -1;

    if(argc < 5 || argc > 6)
    {
        usage(argv[0]);
        return 0;
    }

    s3_uri = argv[1];
    part_size = strtoull(argv[2], NULL, 10);
    if(argc == 6) out_file = argv[3];
    Aws::S3Crt::S3CrtClient *s3_crt_client;
    Aws::S3Crt::Model::GetObjectRequest object_request;
    Aws::SDKOptions options;
    Aws::InitAPI(options);
    Aws::S3Crt::ClientConfiguration config;
    config.httpRequestTimeoutMs = 100;
    config.connectTimeoutMs = 100;
    config.requestTimeoutMs = 100;  
    std::shared_ptr<Aws::Client::RetryStrategy> retry;
    retry.reset(Aws::New<Aws::Client::DefaultRetryStrategy>("test", 0, 25));
    config.retryStrategy = retry;
    config.throughputTargetGbps = 1;
    config.partSize = 10*1024*1024;
    s3_crt_client = Aws::New<Aws::S3Crt::S3CrtClient>("test", config);

    char bucket[128] = {0};
    char key[128] = {0};
    if(sscanf(s3_uri, "s3://%[^/]/%s", bucket, key) == 2)
    {
        Aws::S3Crt::Model::HeadObjectRequest head_object_request;
        Aws::S3Crt::Model::HeadObjectOutcome outcome;
        head_object_request.SetBucket(bucket);
        head_object_request.SetKey(key);
        outcome = s3_crt_client->HeadObject(head_object_request);
        if(outcome.IsSuccess())
            printf("Object found, size:%lu, etag %s\n", outcome.GetResult().GetContentLength(),  outcome.GetResult().GetETag().c_str());
        object_request.SetBucket(bucket);
        object_request.SetKey(key);
    }

    if(out_file)
    {
        out_fd = open(out_file, O_CREAT | O_WRONLY);
        if(out_fd <= 2)
        {
            printf("Error in opening %s in write mode", out_file);
            return 0;
        }
    }
    int64_t pos = 0;

    buf = new char[part_size];
    read_left = read_size;
    while(read_left != 0)
    {
        ret = 0;
        int64_t retry_count = -1;
        to_read = part_size <= read_left ? part_size : read_left;
        iter ++;
        uint64_t start,end;
        Aws::S3Crt::Model::GetObjectOutcome outcome;
        object_request.SetRange(std::string("bytes=") + std::to_string(pos) + "-" + std::to_string(pos+to_read-1));
        object_request.SetResponseStreamFactory(
                [buf, to_read]()
                {
                std::unique_ptr<Aws::StringStream>
                stream(Aws::New<Aws::StringStream>("test"));
                stream->rdbuf()->pubsetbuf(static_cast<char*>(buf),
                        to_read);

                return stream.release();
                });
        start = get_curr_time_ms();
        outcome = s3_crt_client->GetObject(object_request);
        end = get_curr_time_ms();
        printf("Time taken for GetObject %lums\n",end-start);
        if(outcome.IsSuccess())
        {
            ret = outcome.GetResult().GetContentLength();

            if(ret <= to_read)
            {
                pos += ret;
                //log_info("Testing data %02x %02x", ((uint8_t*)buf)[0], ((uint8_t*)buf)[1]);
            }
            else
            {
                printf("This should never happen, pos_:%lu, requested:%lu, read:%lu\n",
                        pos, to_read, ret);
                break;
            }
        }
        else
        {
            printf("Failed to read, pos_:%lu, requested:%lu, error:%s, error_type:%d, error_code:%d\n",
                    pos, to_read, outcome.GetError().GetMessage().c_str(), outcome.GetError().GetErrorType(), outcome.GetError().GetResponseCode()) ;
            break;
        }
        if(ret == 0)
        {
            printf("Nothing read, read_left:%lubytes, iter:%lu\n", read_left, iter);
            break;
        }
        else if(ret < to_read)
        {
            printf("Read %ldbytes instead of %ldbytes, iter:%lu\n", ret, to_read, iter);
        }
        else
        {
            printf("Read %ld bytes, iter:%lu\n", ret, iter);
        }
        read_left -= ret;

        if(ret > 0 && out_fd > 2)
        {
            if(write(out_fd, buf, ret) != ret)
            {
                printf("Error in write!!! (%lu bytes)\n", ret);
                return 0;
            }
            fdatasync(out_fd);
        }
    }

    if(s3_crt_client)
    {
        Aws::Delete(s3_crt_client);
        s3_crt_client = NULL;
    }
    Aws::ShutdownAPI(options);

    if(buf) delete[] buf;
    if(out_fd > 2) close(out_fd);
    return 0;
}

default strategy

#include <fcntl.h>
#include <unistd.h>

#include <aws/core/Aws.h>
#include <aws/core/client/DefaultRetryStrategy.h>
#include <aws/s3-crt/S3CrtClient.h>
#include <aws/s3-crt/model/HeadObjectRequest.h>
#include <aws/s3-crt/model/GetObjectRequest.h>
#include <aws/s3-crt/S3CrtErrors.h>

void usage(const char *bin)
{
    printf("USAGE: %s <s3_uri> <offset in bytes> <read size(0->till end)> <part_size(in bytes)> <out_file(optional)>\n", bin);
}

uint64_t get_curr_time_ms(void)
{
    struct  timespec   ts;
    uint64_t           ts_64;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    ts_64 = (uint64_t)((ts.tv_sec * 1000) + (ts.tv_nsec/1000000));
    return ts_64;
}

int main (int argc, char* argv[])
{
    char *s3_uri;
    uint64_t offset;
    uint64_t read_size;
    uint64_t part_size;
    char *buf = NULL;
    uint64_t read_left;
    uint64_t to_read;
    uint64_t ret;
    uint64_t iter = 0;
    char *out_file = NULL;
    int out_fd = -1;

    if(argc < 5 || argc > 6)
    {
        usage(argv[0]);
        return 0;
    }

    s3_uri = argv[1];
    offset = strtoull(argv[2], NULL, 10);
    read_size = strtoull(argv[3], NULL, 10);
    if(read_size == 0) read_size = UINT64_MAX;
    part_size = strtoull(argv[4], NULL, 10);
    if(argc == 6) out_file = argv[5];
    Aws::S3Crt::S3CrtClient *s3_crt_client;
    Aws::S3Crt::Model::GetObjectRequest object_request;
    Aws::SDKOptions options;
    Aws::InitAPI(options);
    Aws::S3Crt::ClientConfiguration config;
    config.httpRequestTimeoutMs = 100;
    config.connectTimeoutMs = 100;
    config.requestTimeoutMs = 100;  
    config.throughputTargetGbps = 1;
    config.partSize = 10*1024*1024;
    s3_crt_client = Aws::New<Aws::S3Crt::S3CrtClient>("test", config);

    char bucket[128] = {0};
    char key[128] = {0};
    if(sscanf(s3_uri, "s3://%[^/]/%s", bucket, key) == 2)
    {
        Aws::S3Crt::Model::HeadObjectRequest head_object_request;
        Aws::S3Crt::Model::HeadObjectOutcome outcome;
        head_object_request.SetBucket(bucket);
        head_object_request.SetKey(key);
        outcome = s3_crt_client->HeadObject(head_object_request);
        if(outcome.IsSuccess())
            printf("Object found, size:%lu, etag %s\n", outcome.GetResult().GetContentLength(),  outcome.GetResult().GetETag().c_str());
        object_request.SetBucket(bucket);
        object_request.SetKey(key);
    }

    if(out_file)
    {
        out_fd = open(out_file, O_CREAT | O_WRONLY);
        if(out_fd <= 2)
        {
            printf("Error in opening %s in write mode", out_file);
            return 0;
        }
    }
    int64_t pos = 0;

    buf = new char[part_size];
    read_left = read_size;
    while(read_left != 0)
    {
        ret = 0;
        int64_t retry_count = -1;
        to_read = part_size <= read_left ? part_size : read_left;
        iter ++;
        uint64_t start,end;
        Aws::S3Crt::Model::GetObjectOutcome outcome;
        object_request.SetRange(std::string("bytes=") + std::to_string(pos) + "-" + std::to_string(pos+to_read-1));
        object_request.SetResponseStreamFactory(
                [buf, to_read]()
                {
                std::unique_ptr<Aws::StringStream>
                stream(Aws::New<Aws::StringStream>("test"));
                stream->rdbuf()->pubsetbuf(static_cast<char*>(buf),
                        to_read);

                return stream.release();
                });
        start = get_curr_time_ms();
        outcome = s3_crt_client->GetObject(object_request);
        end = get_curr_time_ms();
        printf("Time taken for GetObject %lums\n",end-start);
        if(outcome.IsSuccess())
        {
            ret = outcome.GetResult().GetContentLength();

            if(ret <= to_read)
            {
                pos += ret;
                //log_info("Testing data %02x %02x", ((uint8_t*)buf)[0], ((uint8_t*)buf)[1]);
            }
            else
            {
                printf("This should never happen, pos_:%lu, requested:%lu, read:%lu\n",
                        pos, to_read, ret);
                break;
            }
        }
        else
        {
            printf("Failed to read, pos_:%lu, requested:%lu, error:%s, error_type:%d, error_code:%d\n",
                    pos, to_read, outcome.GetError().GetMessage().c_str(), outcome.GetError().GetErrorType(), outcome.GetError().GetResponseCode()) ;
            break;
        }
        if(ret == 0)
        {
            printf("Nothing read, read_left:%lubytes, iter:%lu\n", read_left, iter);
            break;
        }
        else if(ret < to_read)
        {
            printf("Read %ldbytes instead of %ldbytes, iter:%lu\n", ret, to_read, iter);
        }
        else
        {
            printf("Read %ld bytes, iter:%lu\n", ret, iter);
        }
        read_left -= ret;

        if(ret > 0 && out_fd > 2)
        {
            if(write(out_fd, buf, ret) != ret)
            {
                printf("Error in write!!! (%lu bytes)\n", ret);
                return 0;
            }
            fdatasync(out_fd);
        }
    }

    if(s3_crt_client)
    {
        Aws::Delete(s3_crt_client);
        s3_crt_client = NULL;
    }
    Aws::ShutdownAPI(options);

    if(buf) delete[] buf;
    if(out_fd > 2) close(out_fd);
    return 0;
}

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.11.269

Compiler and Version used

gcc (GCC) 4.8.5

Operating System and version

CentOS Linux and version 7

DmitriyMusatkin commented 5 months ago

CRT retries and timeouts work differently from regular CPP SDK config option and CRT S3 client currently does not honor them. We have a backlog feature request here to improve the situation https://github.com/aws/aws-sdk-cpp/issues/2594.

As an alternative you can configure lowSpeedLimit to indicate to CRT to kill requests that are too slow.

Manjunathagopi commented 5 months ago

Hi @DmitriyMusatkin, i tried setting lowSpeedLimit to 75MBps which will kill the requests that are below 75MBps and tried running the test. Still some requests were taking around 1second to read 25MB(bitrate of around 27MBps) from s3.

Aws::S3Crt::ClientConfiguration config;
config.lowSpeedLimit = 78643200;

Manjunathagopi commented 5 months ago

Hi, Any update on the above query?

DmitriyMusatkin commented 5 months ago

the way low speed limit is configured in sdk is to kill connection if throughput dips under the specified number for a given number of intervals (3 by default and otherwise derived from request timeout) - https://github.com/aws/aws-sdk-cpp/blob/main/generated/src/aws-cpp-sdk-s3-crt/source/S3CrtClient.cpp#L330.

Hard to tell exactly, whats going on and we might need trace level logs to debug it further. Note: CRT already breaks up get requests into part requests based on the configured part size and your code does the same thing, so there might be something weird interaction between the 2 going on.

Manjunathagopi commented 5 months ago

Hi @DmitriyMusatkin attaching the trace logs for this can you please check this?

[Uploading aws_sdk_2024-06-04-11.log…]()

DmitriyMusatkin commented 5 months ago

bad link? it just points to this issue

Manjunathagopi commented 5 months ago

Sorry @DmitriyMusatkin here it is aws_sdk_2024-06-04-11.log

TingDaoK commented 5 months ago

The log you show is less than 2 secs.

[DEBUG] 2024-06-04 11:36:31.624 task-scheduler [140710120072960] id=0x7ff98c003238: Scheduling gather_statistics task for future execution at time 1311437023221890
[DEBUG] 2024-06-04 11:36:31.965 task-scheduler [140709998868224] id=0x7ff988003d38: Scheduling gather_statistics task for future execution at time 1311437363810723
[DEBUG] 2024-06-04 11:36:31.966 task-scheduler [140709998868224] id=0x7ff988026158: Scheduling gather_statistics task for future execution at time 1311437364379715
...
[DEBUG] 2024-06-04 11:36:32.276 task-scheduler [140710120072960] id=0x7ff98c003238: Running gather_statistics task with <Canceled> status
[DEBUG] 2024-06-04 11:36:32.276 task-scheduler [140709998868224] id=0x7ff988003d38: Running gather_statistics task with <Canceled> status
[DEBUG] 2024-06-04 11:36:32.276 task-scheduler [140709998868224] id=0x7ff988026158: Running gather_statistics task with <Canceled> status

You can see the task to monitor the throughput all canceled within 1 sec, which they are scheduled to run after 1 sec. referring to here

The reason for those task to be canceled is because the request has completed.

[TRACE] 2024-06-04 11:36:32.275 S3MetaRequest [140710030337792] id=0x1837460 Meta request clean up finished.

As @DmitriyMusatkin said, SDK will only kill the connection unless it keeps being slow for a certain time.

Manjunathagopi commented 5 months ago

Hi @TingDaoK @DmitriyMusatkin, Below attached the aws trace logs for s3 read of 25 MB 8 times(total of 200MB). Every time read is taking around 1second for each 25MB read.

config.throughputTargetGbps = 1.0;
config.partSize = 10485760;
config.httpRequestTimeoutMs = 100;
config.connectTimeoutMs = 100;
config.requestTimeoutMs = 100;  
config.lowSpeedLimit = 78643200;

https://getshared.com/SvtM9D7w

Can you please explain why it is taking more time than the configured lowspeedlimt and targetthroughput and when will SDK kill the connection and what is the exact threshold to consider it to be slow?

Manjunathagopi commented 5 months ago

Hi, Any update on the above query?

DmitriyMusatkin commented 5 months ago

We reviewed the logs. A couple things are coming into play here

cpp sdk configures low speed limit interval to be 3s from logic here https://github.com/aws/aws-sdk-cpp/blob/main/generated/src/aws-cpp-sdk-s3-crt/source/S3CrtClient.cpp#L330
crt is destroying the connection pool when there are no active requests a given endpoint. That means all connections in that pool are released along with associated tracking. in your sample you are firing requests to a bucket sequentially one after the other, so what ends up happening is pool with a bunch of connections is created for a request, request uses that pool, and once request completes, pool is destroyed. Next request results in pool being recreated. This behavior has changed recently (sometime in april 2024 on crt side and cpp sdk probably imported that change couple weeks after)

what seems to happen in your case is:

you kick off request for 25 MB
internally it splits it into 10, 10, 5 MB requests and puts them on separate connections
each of those smaller requests complete under 1s (and some of them potentially under lowSpeedLimit for the duration of 1s, but because it only run for 1s, it never triggered the connection kill)
once the overall request completes, there is no more requests for a given bucket and it kills the connection pool

Manjunathagopi commented 5 months ago

Hi @DmitriyMusatkin thanks for the detailed analysis. I have few questions regarding this.

Is the division of 25MB to 10,10,5MB is based on part size that is configured which is 10MB?
To reproduce the requests going below lowSpeedLimit case, I purposely set the part size to 500MB, the lowSpeedLimit to 150MB, and the read size to 500MB. As expected, we encountered an error: "Http connection channel shut down due to failure to meet throughput minimum (aws-c-http: AWS_ERROR_HTTP_CHANNEL_THROUGHPUT_FAILURE)." However, the issue is that the operation does not exit immediately after 3 seconds as anticipated. Instead, the getObject operation exits after 30 seconds. This delay suggests that some form of retry mechanism might be occurring internally. How can we improve this?
```
config.throughputTargetGbps = 1.0;
config.partSize = 524288000;
config.lowSpeedLimit = 157286400;
```
Attaching the trace logs download link for issue 2 LINK can you please check this?

Manjunathagopi commented 4 months ago

Hi, Any update on the above query?

DmitriyMusatkin commented 4 months ago

s3 crt client does automatic splitting of all puts and gets into part sized chunks (based on the part size configured on the client) and runs all of them in parallel to optimize throughput
lowSpeedLimit works similar curl low speed limit in this case, it kills the connection because its really slow and tries to complete the request over a different different connection. (note: requestTimeoutMs for regular cpp s3 client maps to curls CURLOPT_LOW_SPEED_TIME as well). httpRequestTimeoutMs works by setting overall curl timeout for regular s3 client, but unfortunately as i mentioned earlier is not supported by s3 crt client yet. By default CRT will configure a number of retries for every request and i guess what you are observing are those retries as connection get killed (should be 5 retries by default with some backoff time between each). You could try to change the retries to make it fail faster, but thats probably not the best practice. Outside of that, im not sure there are other alternatives unfortunatelly, until crt starts supporting request timeouts. Using regular s3 cpp client might be an option to explore as well.

breuner commented 3 months ago

@DmitriyMusatkin : Could you please point me to the place in the code where partSize is being used for parallel requests? So far I couldn't find a place where this value is being used as you described and I'm trying to understand how the parallelization works (e.g. by dynamically spawning more threads or by using async IO on the socket or ...). (Just by the way: Same for throughputTargetGbps, where I also couldn't find a code path that seems to use this value, so I'm wondering what a good default would be if the line speed is unknown - should it rather be too high or too low in case of doubt or can it be set to a special value like "zero" to be ignored?)

aws / aws-sdk-cpp