box / box-python-sdk-gen

Repository for generated Box Python SDK
Apache License 2.0
29 stars 5 forks source link

Downloading files using ResponseByteStream stores them into memory #292

Closed JorXi closed 2 months ago

JorXi commented 2 months ago

Description of the Issue

The ResponseByteStream object returned when using _downloads.downloadfile does not properly manage the memory of the program, loading the whole file to memory. This can fill the whole memory of the system when downloading large enough files.

It appears to be caused because the attribute of the stream _bytes keeps growing and is never freed after being read.

I was able to create a workaround by referencing the attribute of the ResponseByteStream _iterator and using it to copy the stream to a file instead of using read().

Steps to Reproduce

  1. Use downloads.download_file to download a large file following the instructions shown in the docs (https://github.com/box/box-python-sdk-gen/blob/main/docs/downloads.md).
  2. While the file starts downloading the memory used by the process will increase while the file is being downloaded.

Using the following code shows how the attribute _bytes grows indefinitely while downloading a file (needs an authenticated box_sdk_gen.client.BoxClient object named client)

file_content_stream: BufferedIOBase = client.downloads.download_file(<<id of large file>>)
with open("test.zip", "wb", 1024*1024) as f:
    i = 100
    while buf := file_content_stream.read(1024*1024):
        f.write(buf)
        i = i + 1
        if i > 100:
            print("file_content_stream._bytes: {}".format(sys.getsizeof(file_content_stream._bytes) ) )

Expected Behavior

Downloading the file without storing it in memory.

Error Message, Including Stack Trace

Memory fills up.

Screenshots

Versions Used

Python SDK: Python: 3.11

congminh1254 commented 2 months ago

Hi @JorXi

Thanks for your feedback, we created PR #294 to address this issue.

Best, Minh

congminh1254 commented 2 months ago

Hi @JorXi

We have released the SDK version 1.4.1 include this new fix. Please take a look.

Bests, Minh