Open andy-slac opened 3 weeks ago
I can attach full logs from actual execution of that script with and without signal handler, each about 10MB in size.
Thanks for reaching out. Can you share the full logs (with any sensitive info redacted) by adding boto3.set_stream_logger('')
to your script? When you say "The trick is to trigger upload cancellation by hitting Ctrl-C at a correct moment", can you elaborate on that a bit more? Is this issue only reproducible when cancelling the upload at a certain point in the process?
I'll link the Boto3 docs on retries and timeout for Config for reference, although I see you already have those applied in your snippet. I'm wondering if specifying a lower timeout and max_attempts
would reduce the delay you're seeing.
Attaching the log with debug output: log-exc5-handler.log.gz
Reducing the delay is not my goal, the retries are configured reasonably in our case and there is no reason to change those. What I want is for transfer to stop instantly without any delay after the signal is received and exception is raised, there is no reason to retry in that case.
Hitting Ctrl-C at a correct moment means that moment should happen when upload has started but before it has finished, i.e. when the threads are actively pushing the data. That window may be short depending on the size of the file. I used ~100MB file in my case which gave me relatively good chance.
Thanks for following up. It looks like https://github.com/boto/s3transfer/issues/249 may be related. We can plan to bring this issue up with the team for further review.
Describe the bug
I'm not exactly sure if this is
botocore
ors3transfer
issue, more probably some inconsistency between the two. What I observe is that when there is an asynchronous exception (Control-C or similar "random" exception that we use to terminate the application) during S3 upload, the upload does not stop immediately, but instead it hangs for longer than a minute. This happens with multithreaded upload, single-threaded upload does not show the same issue.Regression Issue
Expected Behavior
I would expect that Control-C or other random exception should stop upload almost immediately.
Current Behavior
Running with debug logging enabled I think I see what happens:
__exit__
method is called which callsTransferManager.__exit__()
TransferManager._shutdown
with exception typeFatalError
orCancelledError
depending on the type of original exception.cancel()
method passing the exception type to them.Here is an example of traceback dumped by boto from one of the threads when it goes into a retry loop:
Reproduction Steps
Here is the code that I used to debug the issue, the code is actually a trivial call to
file_upload
method. The trick is to trigger upload cancellation by hitting Ctrl-C at a correct moment. In our setup I used ~100MB file which gave me a reasonable time window during the upload. If you are lucky that after hitting Ctrl-C you should see enormous amount of tracebacks printed like the one I included above.Possible Solution
Maybe retry handler should understand that certain kinds of exceptions should result in immediate failure?
Additional Information/Context
Package versions:
SDK version used
1.34.131
Environment details (OS name and version, etc.)
Red Hat Enterprise Linux release 8.6 (Ootpa), kernel 4.18.0-372.32.1.el8_6.x86_64