aws / aws-sdk-cpp

AWS SDK for C++
Apache License 2.0
1.99k stars 1.06k forks source link

Slower Fetch Times for S3 Objects in INTELLIGENT_TIERING compared to STANDARD Tier #3192

Open Manjunathagopi opened 2 weeks ago

Manjunathagopi commented 2 weeks ago

Describe the bug

Currently, we are attempting to download S3 objects in part size of 24MB, the fetch time for each 24MB chunk is noticeably slower in the INTELLIGENT_TIERING storage class compared to the STANDARD tier.

Regression Issue

Expected Behavior

As we know initial fetching time for intelligent tiering will be slow but once the first download is complete, the rest of the download will be invariant.

Current Behavior

As we know initial fetching time for intelligent tiering will be slow but once the first download is complete, the rest of the download will be invariant. But this is not happening using AWS CPP SDK.

Reproduction Steps

To reproduce the issue, start downloading S3 objects from the INTELLIGENT_TIERING storage class in small chunks and compare with downloading from STANDARD tiering. You'll easily observe that fetch times for each part are significantly slower in INTELLIGENT_TIERING.

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.11.408

Compiler and Version used

gcc (GCC) 4.8.5

Operating System and version

CentOS Linux and version 7

jmklix commented 2 weeks ago

Can you include some trace level logs of the GetObjectRequests that you are making? There should be a header included in the response that says some info about the current tier that your objects currently have

Manjunathagopi commented 2 weeks ago

@jmklix please find the trace level logs for both intelligent tiering and standard tiering below. Intelligent-tiering logs , Standard-tiering logs

jmklix commented 2 weeks ago

Sorry, but I was mistaken. The logs only state the the storage class is INTELLIGENT_TIERING rather then tell us what tier each object is currently at:

[TRACE] 2024-11-18 11:01:37.792 http-stream [140011056908032] id=0x7f56d001e680: Incoming header: x-amz-storage-class: INTELLIGENT_TIERING

This looks like the s3 might not have you object in the tier that you are expecting. This might be because something is wrong on the s3 side, s3 is taking longer than expected to change the tier, or s3 documentation might not be clear with it's documentation for how intelligent tiering is supposed to work. Can you try analyzing what storage tier some objects are before and after you try accessing them? You can to this with s3 Inventory and look for this field S3 Intelligent-Tiering access tier

Manjunathagopi commented 1 week ago

@jmklix but aws s3 cp cli command is taking the same time to download the file irrespective of STANDARD or INTELLIGENT_TIERING

DmitriyMusatkin commented 6 days ago

Is cli and cpp perf similar for standard tier? Im wondering of cli is equally slow for both tiers, but for cpp something is making standard tier faster, but not intelligent tier.

In general there should be no tier specific code in sdks. To sdk is just all endpoint and it does not care what data it is pulling. My initial guess is that it might have something to do with dns resolution or connection pooling. S3 supported mva dns for over a year now, but maybe something in how cpp sdk chooses ip or how it reuses connection causes intelligent to be slower

Manjunathagopi commented 2 days ago

@DmitriyMusatkin relatively CLI performance for both STANDARD and INTELLIGENT-TIERING is the same, so why its not the same in the case of CPP performance?

DmitriyMusatkin commented 1 day ago

Hard to tell off hand without a deeper dive. What we know is on sdk side there is no difference between the tiers, sdk ends up calling the same endpoints regardless of tier. This will require some bandwidth from someone sdk team to investigate.

Some potential theories:

Note: cli does not save anything on client side between runs. So whatever results in improved perf on subsequent runs must be server side