boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
8.81k stars 1.84k forks source link

Memory Leak in S3's download_fileobj and upload_fileobj Methods #4132

Closed Gabibing closed 3 weeks ago

Gabibing commented 1 month ago

Describe the bug

I have encountered a memory leak issue when using the S3 client's download_fileobj and upload_fileobj methods with BytesIO in torchserve environment.

Expected Behavior

I expected the memory usage to remain stable when using download_fileobj and upload_fileobj methods for downloading and uploading files to and from an S3 bucket.

Current Behavior

The memory usage increases constantly and is not released properly. Even after running gc.collect().

Reproduction Steps

Perhaps the issue occurs when continuously downloading/uploading different files. I downloaded/uploaded different media files within torchserve(multi thread env).

import io
import boto3

def reproduce_memory_leak():
    s3_client = boto3.client('s3')
    bucket = 'your-bucket-name'
    s3key_src = '5-10MB files.wav' + str(i) 
    s3key_dst = '5-10MB files.wav' + str(i)

    for i in range(1000):
        with io.BytesIO() as buf:
            s3_client.download_fileobj(bucket, s3key_src, buf)

        with io.BytesIO(b'example bytes') as buf:
            s3_client.upload_fileobj(buf, bucket, s3key_dst)

if __name__ == "__main__":
    reproduce_memory_leak()

Possible Solution

The memory leak can be resolved by replacing the code with the following implementation:

# s3_client.download_fileobj(bucket, s3key_src, buf)
with io.BytesIO() as buf:
    response = s3_client.get_object(Bucket=bucket, Key=s3key_src)
    buf.write(response['Body'].read())

# s3_client.upload_fileobj(buf, bucket, s3key_dst)
with io.BytesIO(b'example bytes') as buf:
    s3_client.put_object(Bucket=bucket, Key=s3key_dst, Body=buf.getvalue())

Additional Information/Context

additional attchment: pytorch config.properties

default_workers_per_model=2
vmargs=-Xmx8g -XX:+UseContainerSupport -XX:+ExitOnOutOfMemoryError
install_py_dep_per_model=true

inference_address=http://0.0.0.0:8080
cors_allowed_origin='*'
cors_allowed_methods=GET, POST
cors_allowed_headers=X-Custom-Header

max_request_size=655350000
max_response_size=655350000
default_response_timeout=300

SDK version used

1.34.106

Environment details (OS name and version, etc.)

Ubuntu 20.04.6 LTS (Docker)

Gabibing commented 1 month ago

I'm closing this issue as it is likely related to PyTorch rather than boto3.

github-actions[bot] commented 1 month ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

Edwardyan1112 commented 1 month ago

@Gabibing I encountered a similar issue. Using download_fileobj in the dataloader leads to memory leaks. Could you explain why you say this problem is related to PyTorch? Is it related to the versions of PyTorch and boto3?

Gabibing commented 1 month ago

@Edwardyan1112 I was mistaken. I think it's a memory issue with boto3. So I reopened this issue.

Edwardyan1112 commented 1 month ago

@Edwardyan1112 I was mistaken. I think it's a memory issue with boto3. So I reopened this issue.

What specific problems did you encounter?

Gabibing commented 1 month ago

@Edwardyan1112 When I use download_fileobj and upload_fileobj, MemoryUtilization.Percent indicators continue to increase. But it won't free memory.

tim-finnigan commented 1 month ago

Thanks for reaching out, can you you please share debug logs (with sensitive info redacted) by adding boto3.set_stream_logger('') to your script, as well as a memory profile report?

github-actions[bot] commented 3 weeks ago

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.