boto / botocore

The low-level, core functionality of boto3 and the AWS CLI.
Apache License 2.0
1.44k stars 1.06k forks source link

Performance issue due to SSLContext being recreated over and over #3171

Open mm-matthias opened 1 month ago

mm-matthias commented 1 month ago

Describe the bug

botocore request/response times are much slower than expected in certain scenarios due to unexpected SSL overhead.

Expected Behavior

botocore /boto3 performs fast for all requests.

Ideally the first request does not take longer than any following requests (e.g. the SSLContext could be initialized during module init or be triggered by the user of the library in some way before any requests happen).

Current Behavior

Request/responses are too slow, see below.

Reproduction Steps

botocore performance is much slower than expected in certain scenarios. This can easily be reproduced by

import boto3
import time

session = boto3.session.Session()
for i in range(10):
    start = time.time()
    session.resource("s3").Object("mybucket", "mykey").get()["Body"].read()
    print(time.time() - start)

which yields

0.4858591556549072
0.1862480640411377
0.20270681381225586
0.22403812408447266
0.19844722747802734
0.18699288368225098
0.17718195915222168
0.18413686752319336
0.1918637752532959
0.18160700798034668

This can be optimized by sharing the resource between iterations:

import boto3
import time

session = boto3.session.Session()
resource = session.resource("s3")
for i in range(10):
    start = time.time()
    resource.Object("mybucket", "mykey").get()["Body"].read()
    print(time.time() - start)

which yields

0.25886106491088867
0.05055093765258789
0.050694942474365234
0.05214500427246094
0.05020618438720703
0.05117917060852051
0.04602789878845215
0.048911094665527344
0.048056840896606445
0.04799509048461914

One can see that the initial request is always quite slow.

But the issue goes further. If there is some delay between requests all the requests get slow. This can be reproduced with:

import boto3
import time

session = boto3.session.Session()
resource = session.resource("s3")

for i in range(10):
    start = time.time()
    resource.Object("mybucket", "mykey").get()["Body"].read()
    print(time.time() - start)
    time.sleep(10)

This yields

0.22714900970458984
0.17821383476257324
0.21504521369934082
0.17998909950256348
0.2338550090789795
0.19853711128234863
0.23908185958862305

Switching to sleep(5) yields fast requests again (except for the first one).

The reason for the slowness is that

Possible Solution

Additional Information/Context

No response

SDK version used

1.34.98

Environment details (OS name and version, etc.)

MacOS & Linux Python 3.12.3

tim-finnigan commented 1 month ago

Thanks for reporting this issue — after discussing with the team we decided that we should continue tracking this for further review and investigation.