SDK GetObject S3 response delays with high concurrency

adityachandla commented 11 months ago

Describe the bug

I am performing a benchmark wherein I am fetching a byte range of an S3 object. This benchmark takes in the number of goroutines that will concurrently be fetching some bytes, the size of the byte range and the number of runs that each goroutine will perform.
This benchmark was performed on a c7gn.2xlarge instance in Ireland and the S3 bucket is also present in the same region.

package main

import (
    "context"
    "flag"
    "fmt"
    "log"
    "math/rand"
    "time"

    "github.com/adityachandla/s3Bench/pkg/common"
    "github.com/aws/aws-sdk-go-v2/aws"
    "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/service/s3"
)

var s3Client *s3.Client

var runs, parallelism, size int

func init() {
    flag.IntVar(&runs, "runs", 10, "Number of objects to fetch")
    flag.IntVar(&parallelism, "parallelism", 2, "Number of concurrent fetchers")
    flag.IntVar(&size, "size", 1<<9, "Size of fetched info in bytes")
}

func main() {
    flag.Parse()
    cfg, err := config.LoadDefaultConfig(context.TODO(), config.WithRegion("eu-west-1"))
    if err != nil {
        log.Fatal(err)
    }
    s3Client = s3.NewFromConfig(cfg)
    fetchTimes := runBenchmark()
    printCsv(fetchTimes)
}

func runBenchmark() []int64 {
    timesChannel := make(chan int64)
    for p := 0; p < parallelism; p++ {
        go fetchBytes(timesChannel)
    }

    fetchTimes := make([]int64, parallelism*runs)
    for i := 0; i < runs*parallelism; i++ {
        fetchTimes[i] = <-timesChannel
    }
    return fetchTimes
}

func fetchBytes(timesChannel chan<- int64) {
    for i := 0; i < runs; i++ {
        timesChannel <- timeToFetch()
    }
}

func timeToFetch() int64 {
    startRange := rand.Intn(common.FILE_SIZE_BYTES - size)
    endRange := startRange + size
    req := s3.GetObjectInput{
        Bucket: aws.String(common.BUCKET_NAME),
        Key:    aws.String(common.KEY),
        Range:  aws.String(fmt.Sprintf("bytes=%d-%d", startRange, endRange)),
    }
    //log.Printf("Requesting byte range %d-%d", startRange, endRange)
    start := time.Now().UnixMilli()
    _, err := s3Client.GetObject(context.TODO(), &req)
    if err != nil {
        log.Fatalf("%v", err)
    }
    end := time.Now().UnixMilli()
    return end - start
}

func printCsv(arr []int64) {
    s := len(arr)
    for i := 0; i < s-1; i++ {
        fmt.Printf("%d,", arr[i])
    }
    fmt.Printf("%d\n", arr[s-1])
}

Expected Behavior

When I run the benchmark, I expect to get latency times that are similar to the latency times on cloudwatch's S3 request metrics. This is indeed what I get when I fetch 512 bytes with 20 goroutines 1000 times each, this is the graph for the latency: c7FetchTimes512b And this is the total request latency in cloudwatch: Screenshot from 2023-11-29 09-16-23

The request latency that I observe is more but not by that much and this is fine. However, something wierd happens when I change the fetch range from 512 bytes to 512 kilobytes.

Current Behavior

When I run the same benchmark with 512 kilobytes, 20 goroutines, each fetching the byte range 1000 times each, I see the following graph: c7FetchTimes As you can see, after around 10k requests, the latency increases drastically for some requests. However, according to the cloudwatch metrics, it was more or less constant: Screenshot from 2023-11-29 09-17-38 I have repeated this benchmark multiple times and at exactly 10k requests, I start to see this clustering of requests into higher latency buckets as can be seen in the graph.

Only two things can happen between the start and end time, garbage collection and execution of SDK code. The former does not appear to be the problem as I have ran this benchmark with GOGC=500 and GOMEMLIMIT=4096MiB and got almost the same result. This leads me to believe that there is something happening in the SDK that is causing this behaviour.

Reproduction Steps

I am running the code on a c7gn.2xlarge instance and on a 128MB file that is in an S3 bucket in the same region as the EC2 instance.

Possible Solution

No response

Additional Information/Context

I have already checked the following things:

The CPU and network capacity of the instance does not appear to be the issue. Both of these metrics were well within range for c7gn.2xlarge.
The garbage collection is not an issue. I see the same results with higher values of GOMEMLIMIT and GOGC (4Gb and 500 respectively)
Go channels are not the issue. Even if I change the code so that each goroutine captures the latency values in a local array and then sends that array on a channel, the behaviour is still the same.

AWS Go SDK V2 Module Versions Used

github.com/aws/aws-sdk-go-v2 v1.22.2 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.5.0 // indirect
github.com/aws/aws-sdk-go-v2/config v1.23.0 // indirect
github.com/aws/aws-sdk-go-v2/credentials v1.15.2 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.14.3 // indirect
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.13.5 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.2.2 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.5.2 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.6.0 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.2.2 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.10.0 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.2.2 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.10.2 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.16.2 // indirect
github.com/aws/aws-sdk-go-v2/service/s3 v1.42.1 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.17.1 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.19.1 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.25.1 // indirect
github.com/aws/smithy-go v1.16.0 // indirect
github.com/jmespath/go-jmespath v0.4.0 // indirect

Compiler and Version used

go version go1.20.4 linux/amd64

Operating System and version

ubuntu 23.04

RanVaknin commented 11 months ago

Hi @adityachandla ,

Thanks for opening this issue and for the detailed analysis. Are you seeing any explicit server errors after the 10,000 request mark? I would try enabling the logger to see if there are any obvious patterns you can see from the raw request /response.

cfg, err := config.LoadDefaultConfig(context.TODO(), config.WithRegion("eu-west-1"), config.WithClientLogMode(aws.LogRequestWithBody|aws.LogResponseWithBody))

I assume that 10,000 invocations would produce lots of logs, so I would be mindful of any financial implications if you are storing logs from the applications context.

As it stands, while interesting, the data is not really actionable for now. We need to understand where the bulk of the delay is happening, distinguishing between the time spent in the overall operation and the time spent on the HTTP call itself.

Here is an example of a probe @lucix-aws put together to help us better understand the nature of the delay. Can you integrate this example into your use case and share the result with us?

package main

import (
    "context"
    "fmt"
    "log"
    "net/http"
    "sync"
    "time"

    awsmiddleware "github.com/aws/aws-sdk-go-v2/aws/middleware"
    awshttp "github.com/aws/aws-sdk-go-v2/aws/transport/http"
    "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/service/sqs"
    "github.com/aws/smithy-go/middleware"
    smithyrand "github.com/aws/smithy-go/rand"
)

// WithOperationTiming instruments an SQS client to dump timing information for
// the following spans:
//   - overall operation time
//   - HTTPClient call time
//
// This instrumentation will also emit the request ID, service name, and
// operation name for each invocation.
//
// Accepts a message "handler" which is invoked with formatted messages to be
// handled externally, you can use the declared PrintfMSGHandler to simply dump
// these values to stdout.
func WithOperationTiming(msgHandler func(string)) func(*sqs.Options) {
    return func(o *sqs.Options) {
        o.APIOptions = append(o.APIOptions, addTimingMiddlewares(msgHandler))
        o.HTTPClient = &timedHTTPClient{
            client:     awshttp.NewBuildableClient(),
            msgHandler: msgHandler,
        }
    }
}

// PrintfMSGHandler writes messages to stdout.
func PrintfMSGHandler(msg string) {
    fmt.Printf("%s\n", msg)
}

type invokeIDKey struct{}

func setInvokeID(ctx context.Context, id string) context.Context {
    return middleware.WithStackValue(ctx, invokeIDKey{}, id)
}

func getInvokeID(ctx context.Context) string {
    id, _ := middleware.GetStackValue(ctx, invokeIDKey{}).(string)
    return id
}

func timeSpan(ctx context.Context, name string, consumer func(string)) func() {
    start := time.Now()
    return func() {
        elapsed := time.Now().Sub(start)
        consumer(fmt.Sprintf("[%s] %s: %s", getInvokeID(ctx), name, elapsed))
    }
}

type timedHTTPClient struct {
    client     *awshttp.BuildableClient
    msgHandler func(string)
}

func (c *timedHTTPClient) Do(r *http.Request) (*http.Response, error) {
    defer timeSpan(r.Context(), "http", c.msgHandler)()

    resp, err := c.client.Do(r)
    if err != nil {
        return nil, fmt.Errorf("inner client do: %v", err)
    }

    return resp, nil
}

type addInvokeIDMiddleware struct {
    msgHandler func(string)
}

func (*addInvokeIDMiddleware) ID() string { return "addInvokeID" }

func (*addInvokeIDMiddleware) HandleInitialize(ctx context.Context, in middleware.InitializeInput, next middleware.InitializeHandler) (
    out middleware.InitializeOutput, md middleware.Metadata, err error,
) {
    id, err := smithyrand.NewUUID(smithyrand.Reader).GetUUID()
    if err != nil {
        return out, md, fmt.Errorf("new uuid: %v", err)
    }

    return next.HandleInitialize(setInvokeID(ctx, id), in)
}

type timeOperationMiddleware struct {
    msgHandler func(string)
}

func (*timeOperationMiddleware) ID() string { return "timeOperation" }

func (m *timeOperationMiddleware) HandleInitialize(ctx context.Context, in middleware.InitializeInput, next middleware.InitializeHandler) (
    middleware.InitializeOutput, middleware.Metadata, error,
) {
    defer timeSpan(ctx, "operation", m.msgHandler)()
    return next.HandleInitialize(ctx, in)
}

type emitMetadataMiddleware struct {
    msgHandler func(string)
}

func (*emitMetadataMiddleware) ID() string { return "emitMetadata" }

func (m *emitMetadataMiddleware) HandleInitialize(ctx context.Context, in middleware.InitializeInput, next middleware.InitializeHandler) (
    middleware.InitializeOutput, middleware.Metadata, error,
) {
    out, md, err := next.HandleInitialize(ctx, in)

    invokeID := getInvokeID(ctx)
    requestID, _ := awsmiddleware.GetRequestIDMetadata(md)
    service := awsmiddleware.GetServiceID(ctx)
    operation := awsmiddleware.GetOperationName(ctx)
    m.msgHandler(fmt.Sprintf(`[%s] requestID = "%s"`, invokeID, requestID))
    m.msgHandler(fmt.Sprintf(`[%s] service   = "%s"`, invokeID, service))
    m.msgHandler(fmt.Sprintf(`[%s] operation = "%s"`, invokeID, operation))

    return out, md, err
}

func addTimingMiddlewares(mh func(string)) func(*middleware.Stack) error {
    return func(s *middleware.Stack) error {
        if err := s.Initialize.Add(&timeOperationMiddleware{msgHandler: mh}, middleware.Before); err != nil {
            return fmt.Errorf("add time operation middleware: %v", err)
        }
        if err := s.Initialize.Add(&addInvokeIDMiddleware{msgHandler: mh}, middleware.Before); err != nil {
            return fmt.Errorf("add invoke id middleware: %v", err)
        }
        if err := s.Initialize.Insert(&emitMetadataMiddleware{msgHandler: mh}, "RegisterServiceMetadata", middleware.After); err != nil {
            return fmt.Errorf("add emit metadata middleware: %v", err)
        }
        return nil
    }
}

func main() {
    cfg, err := config.LoadDefaultConfig(context.Background())
    if err != nil {
        log.Fatal(fmt.Errorf("load default config: %v", err))
    }

    svc := sqs.NewFromConfig(cfg, WithOperationTiming(PrintfMSGHandler))

    var wg sync.WaitGroup

    for i := 0; i < 6; i++ {
        wg.Add(1)
        go func() {
            defer wg.Done()

            _, err = svc.ListQueues(context.Background(), nil)
            if err != nil {
                fmt.Println(fmt.Errorf("list queues: %v", err))
            }
        }()
    }
    wg.Wait()
}

Thank you very much, Ran~

adityachandla commented 11 months ago

First of all, thanks for your reply @RanVaknin.

Regarding your first point about enabling request logging, there are metrics that count the number of 4xx or 5xx responses in S3 and I had already checked those metrics and there were no 4xx or 5xx responses returned by S3 so I don't think there is any point in printing and analyzing those.

I integrated the second part and this is the file that got generated from those logs: (I removed the logs related to service name and request type as that will always be S3 and GetObjectRequest) c7gn.2xlarge_512Kb_1000_x20.txt I wrote the following script in python to analyze the data and the difference between the operation time and http time is always less than 2 milliseconds.

import sys
import re
from dataclasses import dataclass

@dataclass
class RequestInfo:
    operation: float
    http: float

def main():
    with open(sys.argv[1], "r") as f:
        lines = f.read().strip().split("\n")
    lines = [l for l in lines if "=" not in l] # Filter request id lines
    matcher = re.compile("\[(.*)\] ([a-z]*): ([0-9\\.]*)(m?)s")
    values = []
    for l in lines:
        values.append(matcher.findall(l)[0])
    request_info = dict()
    for v in values:
        if v[0] in request_info:
            curr = request_info[v[0]]
        else:
            curr = RequestInfo(0,0)
        val = float(v[2])
        if v[3] != 'm':
            val *= 1000 # Value was in seconds
        if v[1] == "http":
            curr.http = float(v[2])
        else:
            curr.operation = float(v[2])
        request_info[v[0]] = curr
    for k,v in request_info.items():
        delay = v.operation-v.http
        if delay > 1:
            print(v)

if __name__ == "__main__":
    main()

However, I still see the same behaviour as before where request latency becomes significantly more after 10k requests: c7FetchTimes512b Also, the total request latency is still around 20-30 milliseconds according to the cloudwatch metrics.

I don't think that it should take more than a couple hundred milliseconds for a request to go from s3 to ec2 in the same region. In terms of network capacity, if we assume that all 20 goroutines are fetching 512KB of data every 20 milliseconds (S3 latency), even then the total data fetched per second is 512 x 20 x 50 bytes or around 500MB so the bandwidth being utilized would be around 4Gbps which is not even 10% of the bandwidth of c7gn.2xlarge.

Any suggestions on how I should proceed here?

lucix-aws commented 11 months ago

the difference between the operation time and http time is always less than 2 milliseconds.

This is ultimately the measurement that matters, and the difference there is what I'd expect - there's no indication of SDK misbehavior.

Network throughput is not the only factor at this level of concurrency. There are a number of additional variables here, including, but not limited to

application file descriptor limits
the service itself throttling the number of connections it will accept from you
other possible throughput or connection count limitations imposed by underlying http.Transport or net.Dialer settings (you can configure this through the HTTPClient field on service client Options)

The most important thing to understand here is that the entire HTTP operation is delegated to what is ultimately an instance of the standard library net/http client, the behavior of which is outside of our control. There's no queueing, throttling, or pooling logic of any kind in front of that delegation - if you call GetObject 10,000 times in a second, we will blindly serialize 10,000 corresponding *http.Requests and pass them to http.Client.Do. So from our perspective, if we're not spending an inordinate amount of time up to that point of delegation (e.g. it's not taking us say 1 full second to go from you calling GetObject to us sending the request) we're operating exactly as we should.

At this time I'm inclined to close this issue, since the data more or less exonerates the SDK. If you wish to dive deeper into the networking internals your best course of action is going to be to wire up an httptrace to get a more detailed timing breakdown.

adityachandla commented 11 months ago

Thanks for your detailed response @lucix-aws. I will try to dig deeper into this issue using httptrace. I agree with you that if the SDK simply passes requests to the HttpClient then there is likely no issue here. Thanks a lot for helping out :bowing_man:

github-actions[bot] commented 11 months ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

aws / aws-sdk-go-v2