Open mattBrzezinski opened 1 year ago
Hi @mattBrzezinski - thanks for reaching out. I attempted to replicate with this code example provided in the doc using ~20 files and wasn't able to reproduce this behavior on my end. A couple questions for you:
boto3.set_stream_logger('')
to your code?Hi @mattBrzezinski - thanks for reaching out. I attempted to replicate with this code example provided in the doc using ~20 files and wasn't able to reproduce this behavior on my end. A couple questions for you:
* How many files are you working with and what's the average size of the files? * Does it occur intermittently? * Can you share debug logs by adding `boto3.set_stream_logger('')` to your code?
How many files are you working with and what's the average size of the files?
I've ran this code w/ a variety of files, twenty 256MB files, five 1GB files, and a single 5GB file. This happens across the board.
Does it occur intermittently?
This happens consistently for me.
Can you share debug logs by adding boto3.set_stream_logger('') to your code?
I am unable to upload the logs to GitHub as they're too big (83MB), here is a link to them in my Google Drive however. The logs unfortunately do not show it in their timestamps. But in these areas,
2023-03-14 15:04:50,803 s3transfer.tasks [DEBUG] Executing task IOWriteTask(transfer_id=0, {'offset': 4987551744}) with kwargs {'fileobj': <_io.BufferedWriter name='file-5gb'>, 'offset': 4987551744}
2023-03-14 15:04:50,803 s3transfer.utils [DEBUG] Acquiring 0
2023-03-14 15:04:50,804 s3transfer.futures [DEBUG] Submitting task CompleteDownloadNOOPTask(transfer_id=0, {}) to executor <s3transfer.futures.BoundedExecutor object at 0xffff95e469a0> for transfer request: 0.
2023-03-14 15:04:50,804 s3transfer.utils [DEBUG] Releasing acquire 0/None
2023-03-14 15:04:50,804 s3transfer.utils [DEBUG] Acquiring 0
There are long periods of time, specifically after the line Releasing acquire
, where nothing seems to be happening. This seems to me like a more generic Python IO issue rather than a boto3 one.
@mattBrzezinski - thanks for sharing additional info. I'll have to agree with you that the logs wasn't very helpful in this case and I couldn't find anything related specific to your situation.
However, I'd suggest giving a shot with the code example from the docs since that's how I attempted to reproduce and didn't see the slowness on my end. Could you also elaborate on your use case again in details that I'd like to bring it to our team for a discussion? Do you see the slowness only for the files that already existed in your instance?
However, I'd suggest giving a shot with the code example from the docs since that's how I attempted to reproduce and didn't see the slowness on my end.
That code example is actually where I based mine off of 😅
Could you also elaborate on your use case again in details that I'd like to bring it to our team for a discussion?
I was just testing to see how setting different TransferConfig parameters affect the throughput when downloading files from S3. I was able to work around this issue by just deleting the files between each test run, but figured I'd log an issue as this could affect others in a more realistic way.
Do you see the slowness only for the files that already existed in your instance?
Correct. If the files do not exist on the instance I get download times that are to be expected. When the file already exists and I attempt to download it again I get quite the slowness. I don't think I had mentioned it but this was run using Python3.8.10
.
Describe the bug
I was doing some benchmark testing of boto3 and came across what I considered an oddity when calling the download_fileobj(). When a file with the name exists on disk already the function takes ~3x longer to execute.
Expected Behavior
I was expecting the time to run be unaffected whether the file is present or not.
Current Behavior
Existing files makes the calls last longer.
Reproduction Steps
Possible Solution
No response
Additional Information/Context
I'm not 100% familiar w/ Python intricacies so I'm unsure if this is an expected affect, however I was not able to find this question / concern posted before hand so wanted to reach out about it.
SDK version used
boto3==1.26.68
Environment details (OS name and version, etc.)
S3 bucket resides in
us-east-2
, the code running here is on anm7g.8xlarge
EC2 instance in the same region.