Open daveisfera opened 1 month ago
Thanks for reaching out. Unfortunately we cannot guarantee compatibility with third-party libraries like aws-vault. To investigate a boto3 issue here we would need a code snippet to reproduce the issue, in addition to debug logs (with sensitive info redacted) which you can get by adding boto3.set_stream_logger('')
to your script.
aws-vault
is just a convenient way to generate the necessary ENVs to authorizing with boto3
and the same problem can be reproduced with any token that expires. I'll grab logs of it happening.
Here's a minimal reproducer (and I believe the simplest example of using BaseSubscriber
and on_done
that's possible) and I've attached the logs from running it with an expired token:
import logging
from io import BytesIO
from boto3 import set_stream_logger
from boto3.s3.transfer import BaseSubscriber
from boto3.session import Session
from s3transfer.manager import TransferConfig, TransferManager
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
set_stream_logger("")
class CheckDone(BaseSubscriber):
def on_done(self, future, **kwargs) -> None:
try:
res = future.result()
logger.info("Upload worked: %s", res)
except Exception as e:
logger.error("Upload failed: %s", e)
def _main() -> None:
session = Session()
s3_tm = TransferManager(session.client("s3"), TransferConfig(max_request_concurrency=3))
s3_tm.upload(
BytesIO(b"testing"),
"test_bucket",
"s3transfer_test.txt",
subscribers=[CheckDone("s3transfer_test.txt")],
)
s3_tm.shutdown()
logger.info("Done")
if __name__ == "__main__":
_main()
Thanks for following up. We don't recommend using s3transfer directly, as noted in the README:
This project is not currently GA. If you are planning to use this code in production, make sure to lock to a minor version as interfaces may break from minor version to minor version. For a basic, stable interface of s3transfer, try the interfaces exposed in boto3
Have you tried using the Boto3 upload methods for S3? You can handle any errors as documented here or handle events like after-call-error.
Sorry, I pulled from the wrong import
when putting together that minimal reproducer. Here's it with the import
from boto3
:
import logging
from io import BytesIO
from boto3 import set_stream_logger
from boto3.s3.transfer import BaseSubscriber, TransferConfig, TransferManager
from boto3.session import Session
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
set_stream_logger("")
class CheckDone(BaseSubscriber):
def on_done(self, future, **kwargs) -> None:
try:
res = future.result()
logger.info("Upload worked: %s", res)
except Exception as e:
logger.error("Upload failed: %s", e)
def _main() -> None:
session = Session()
s3_tm = TransferManager(session.client("s3"), TransferConfig(max_concurrency=3))
s3_tm.upload(
BytesIO(b"testing"),
"test_bucket",
"s3transfer_test.txt",
subscribers=[CheckDone("s3transfer_test.txt")],
)
s3_tm.shutdown()
logger.info("Done")
if __name__ == "__main__":
_main()
That snippet is still using the TransferManager directly which isn't recommended:
Thanks for following up. We don't recommend using s3transfer directly, as noted in the README:
This project is not currently GA. If you are planning to use this code in production, make sure to lock to a minor version as interfaces may break from minor version to minor version. For a basic, stable interface of s3transfer, try the interfaces exposed in boto3
Have you tried using the Boto3 upload methods for S3? You can handle any errors as documented here or handle events like after-call-error.
Then what is the recommended way to upload multiple files in parallel in a separate thread using boto3
?
This is the only information I could find about uploading in parallel and it looks like I'm doing a simpler version of what that code shows, so is there a better way?
Hi @daveisfera — we recommend using upload_file or upload_fileobj. And here is the documentation on multithreading/multiprocessing. Although the S3Transfer code you shared is public it's still recommended to use Boto3 directly.
Also looking back at the logs you shared, it looks like you're on_done
function is being called and the exception is logged (ERROR:__main__:Upload failed:...
).
Hi @daveisfera — we recommend using upload_file or upload_fileobj. And here is the documentation on multithreading/multiprocessing. Although the S3Transfer code you shared is public it's still recommended to use Boto3 directly.
I'm glad to switch but that documentation doesn't seem to provide information on how to actually perform the uploads in another thread. It also mentions events but I don't see how to receive an event notification like on_done
, so is there detailed information on how to use it?
Also looking back at the logs you shared, it looks like you're
on_done
function is being called and the exception is logged (ERROR:__main__:Upload failed:...
).
You are correct. I'll check and see if I can get logs from when the callback doesn't happen
Describe the bug
If the token has expired, then the file isn't uploaded (expected), but on_done is not called and there's no way to get notification of the failure
(refiling https://github.com/boto/s3transfer/issues/304 here since I use
s3transfer
throughboto3
)Expected Behavior
An error would be reported
Current Behavior
The upload silently fails
Reproduction Steps
aws-vault
on_done
is never calledPossible Solution
Report an error using
on_done
in the same way from before the expirationAdditional Information/Context
No response
SDK version used
1.34.97
Environment details (OS name and version, etc.)
Debian Bookworm with Python 3.12.3