Closed dmittendorf closed 3 months ago
Hi @dmittendorf thanks for reaching out. From what you described it looks like this is not directly an issue with Boto3, but rather something involving OpenSSL, CPython, and/or Python 3.12. Considering that you're using very recent versions of everything, there may be some new compatibility issue or edge case here. It might be worth reaching out in repositories like https://github.com/python/cpython/issues or https://github.com/openssl/openssl/issues to try and get more information.
In terms of Boto3 here is documentation on multithreading with clients. If you would like to share a code snippet and debug logs (with sensitive info redacted) by adding boto3.set_stream_logger('')
to your script then we could review Boto3-related behavior further.
Hi @tim-finnigan. Thanks for the response. I agree that the root cause seems to be somewhere down in openssl, but wasn't sure if it was somehow triggered by the way that botocore is using the library.
The code I posted above that reproduces the bug is pretty much the same as the example code from the multithreading docs, except I am using a barrier to try and force as much concurrency between the threads as possible.
I'm attaching the output from a new reproduction with the stream logger enabled.
Thanks for following up here. Have you tried testing on different versions of Python and OpenSSL? Or different Linux/Windows environments? I could not reproduce the issue given the code snippet you had provided. Does reducing the number of threads help here?
It doesn't reproduce every time. Only when I run the script after a period of "quiet time".
Could you share more details here regarding this, like how often you can reproduce the issue and what you mean by quiet time?
There is a similar issue with a reproducer here: https://github.com/openssl/openssl/issues/24480
Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.
Our application shares Client instances across multiple threads.
This seems to be not supported, please see the updated doc at https://github.com/boto/boto3/pull/4157
Describe the bug
Our application shares Client instances across multiple threads. Occasionally, we are seeing segfaults in production when the Client is first used. It appears to always occur during SSL handshaking.
Expected Behavior
It is expected that Client instances are fully thread-safe.
Current Behavior
This is a thread-dump when observed in production:
This is a core dump when reproduced on a Mac with the script below:
Reproduction Steps
I used the following script to reproduce on my Mac. It doesn't reproduce every time. Only when I run the script after a period of "quiet time". Note the
TopicArn
param should be replaced with the ARN of a valid SNS topic that is accessible from the calling context.Possible Solution
No response
Additional Information/Context
No response
SDK version used
1.34.78
Environment details (OS name and version, etc.)
Mac OS 14.0, Python 3.12.0, OpenSSL 3.1.3