Open MustaphaU opened 8 months ago
I am facing this maximum recursion depth issue suddenly as well when trying to check if the object exists in the s3 bucket using
s3_client.head_object(Bucket=bucket_name, Key=key)
It used to work before but not sure if something changed suddenly. the s3 client is created using
boto3.client(service_name='s3',
use_ssl=False,
region_name=region,
endpoint_url=endpoint_url,
aws_access_key_id=key_id,
aws_secret_access_key=access_key,
config=Config(
s3={'addressing_style': 'path'},
signature_version='s3v4'))
Hi @MustaphaU, thanks for reaching out. If you limit the script to only be initializing a client (no actual operations), do you still have this behavior? In other words, what is the minimum reproducible code snippet that produces this recursion depth error? Thanks!
Hi @MustaphaU, thanks for reaching out. If you limit the script to only be initializing a client (no actual operations), do you still have this behavior? In other words, what is the minimum reproducible code snippet that produces this recursion depth error? Thanks!
Hi @RyanFitzSimmonsAK Just initializing the s3 client in my inference script like below is enough to reproduce the error
s3_client = boto3.client('s3')
Thank you.
Edit: The error persists. Apologies for the back and forth. Yes, s3_client=boto3.client('s3')
should produce the error. I just tested now and got the error.
@RyanFitzSimmonsAK
Please see the attached below from cloudwatch logs:
Also, see the relevant part of the inference script:
You would observe from the log that execution failed at the point of initializing the s3 client.
Thanks.
I also have this bug. One message I got while performing some tests to fix that:
Hope it could help.
As a workaround I used the awscli already present in the container:
import subprocess
subprocess.run(["/usr/local/bin/aws", "s3", "cp", "s3://bucket/file, "/local/file"], check=True)
I am also getting the same error. It was working fine a few weeks ago.
So,
s3.Bucket(settings.S3_BUCKET).put_object(Key=key, Body=file_data)
works, but the following code doesn't. This is a nightmare :)
res = self.s3.put_object(Bucket=settings.S3_BUCKET,
Key=key,
Body=file_data)
Probably same thing goes to get_object
Given that you're only seeing this behavior in Sagemaker inference scripts, it's likely not purely a Boto3 problem. I've reached out to the Sagemaker team for more information about this issue, and will update this issue whenever I have more information.
Ticket # for internal use : P133939124
Neither I nor the service team were able to reproduce this issue. Could you provide the following information?
inference.py
that produces this behavior?@RyanFitzSimmonsAK Thanks. Not following an example notebook or deploying in a VPC. I have created a repo with instructions to reproduce the issue here: https://github.com/MustaphaU/rerror
seeing this issue as well, except with creating clients for the boto3 secrets_manager
Hi, just an update. The service team was able to reproduce this behavior, and is working on determining the root cause.
Hi, just an update. The service team was able to reproduce this behavior, and is working on determining the root cause.
this is fantastic news! thank you team! :)
just for external planning and orientation, are there any ideas roughly if this is a high-priority issue or some other level? the bug is exhibiting for us in one of our critical paths. we have a temporary bypass for it but would really like to get back to using boto3 fully.
appreciate the help, and very happy you can reproduce the issue :)
I was facing the same issue when trying to build a sagemaker serving tenserflow image. adding monkey patch to python_service.py at the very top helped me.
import gevent.monkey
gevent.monkey.patch_all()
this was suggested in stackoverflow thread here: https://stackoverflow.com/questions/45425236/gunicorn-recursionerror-with-gevent-and-requests-in-python-3-6-2
I was facing the same issue when trying to build a sagemaker serving tenserflow image. adding monkey patch to python_service.py at the very top helped me.
import gevent.monkey gevent.monkey.patch_all()
this was suggested in stackoverflow thread here: https://stackoverflow.com/questions/45425236/gunicorn-recursionerror-with-gevent-and-requests-in-python-3-6-2
Thanks for the suggestion. I had tried this fix but it didn't resolve the issue. I mentioned it here on stackoverflow
I was facing the same issue when trying to build a sagemaker serving tenserflow image. adding monkey patch to python_service.py at the very top helped me.
import gevent.monkey gevent.monkey.patch_all()
this was suggested in stackoverflow thread here: https://stackoverflow.com/questions/45425236/gunicorn-recursionerror-with-gevent-and-requests-in-python-3-6-2
Thanks for the suggestion. I had tried this fix but it didn't resolve the issue. I mentioned it here on stackoverflow
You don't clarify it, but have you added it to your model inference code or you built the sagemaker image with it? didn't work for me when I tried it on the inference code. It has to happen prior any other python import happens.
@deepblue-phoenix Would you mind sharing your workaround for this issue?
Does anyone have a solution or a timeline on this?
@deepblue-phoenix Would you mind sharing your workaround for this issue?
Does anyone have a solution or a timeline on this?
You could try the suggestions by @shresthapradip or this workaround by @pmaoui if it applies to your case.
Hello, I am having the same issue on a sagemaker custom inference.py script (attached).
I tried both using gevent.monkey.patch_all() and gevent.monkey.patch_all(ssl=False), but the issue persists. I hope there will be a solution soon.
My inference.py :
import gevent.monkey
gevent.monkey.patch_all(ssl=False)
import json
import numpy as np
from PIL import Image
import io
import logging
import tempfile
import boto3
# Configure logger
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
s3_client = boto3.client('s3')
def open_image(image_data):
try:
return Image.open(io.BytesIO(image_data)) # Supports every type of image extension
except Exception as e:
logger.error(f"Error opening image: {str(e)}")
raise
def read_image_from_s3(s3_uri):
"""Load image file from s3.
Parameters
----------
s3_uri : string
S3 URI in the form s3://bucket/key
Returns
-------
np.array
Image array
"""
try:
bucket, key = s3_uri.replace("s3://", "").split("/", 1)
logger.info(f"Parsed bucket: {bucket}, key: {key}")
logger.info(f"Reading image from bucket: {bucket}, key: {key}")
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket)
object = bucket.Object(key)
response = object.get()
file_stream = response['Body']
im = Image.open(file_stream)
image_array = np.array(im)
logger.info(f"Successfully read image from S3 bucket: {bucket}, key: {key}")
return image_array
except Exception as e:
logger.error(f"Error reading image from S3 bucket: {bucket}, key: {key}, error: {str(e)}")
raise
def input_handler(data, context):
""" Pre-process request input before it is sent to TensorFlow Serving REST API
Args:
data (obj): the request data stream if images, dict or string if text.
context (Context): an object containing request and configuration details
Returns:
(dict): a JSON-serializable dict that contains request body and headers
"""
try:
logger.info(f"Request content type: {context.request_content_type}")
with tempfile.TemporaryDirectory() as temp_dir:
logger.info(f"Created temporary directory at {temp_dir}")
if "image" in context.request_content_type:
payload = data.read()
image = open_image(payload)
image_array = np.array(image)
image_with_batch_dim = np.expand_dims(image_array, axis=0) # Add batch dimension
# Input format is the same as TF Serving API: https://www.tensorflow.org/tfx/serving/api_rest
response_payload = json.dumps({"instances": image_with_batch_dim.tolist()}) # tolist preserves the shape [1, 224, 224, 3]
return response_payload
elif "json" in context.request_content_type:
payload = data.read().decode('utf-8')
json_data = json.loads(payload)
# Assuming the structure of json_data is {"s3_uris": ["s3://bucket/key1", "s3://bucket/key2", ...]}
s3_uris = json_data.get("s3_uris", [])
logger.info(f"Received S3 URIs: {s3_uris}")
images = []
for s3_uri in s3_uris:
try:
image_array = read_image_from_s3(s3_uri)
images.append(image_array)
except Exception as e:
logger.error(f"Failed to process image from S3 URI {s3_uri}: {str(e)}")
if not images:
raise ValueError("No valid images found in the provided S3 URIs.\n Please, provide a json stream with key 's3_uris' and a list of uris as value.")
images_with_batch_dim = np.stack(images, axis=0) # Stack images to create a batch
response_payload = json.dumps({"instances": images_with_batch_dim.tolist()})
return response_payload
raise ValueError(f'{{"error": "unsupported content type {context.request_content_type or "unknown"}"}}')
except Exception as e:
logger.error(f"Error in input_handler: {str(e)}")
raise
def output_handler(data, context):
"""Post-process TensorFlow Serving output before it is returned to the client.
Args:
data (obj): the TensorFlow serving response as described here: https://www.tensorflow.org/tfx/serving/api_rest#response_format_4
context (Context): an object containing request and configuration details
Returns:
(bytes/json, string): data to return to client, response content type
"""
try:
if data.status_code != 200:
raise ValueError(data.content.decode('utf-8'))
response_content_type = context.accept_header
prediction = data.content
return prediction, response_content_type
except Exception as e:
logger.error(f"Error in output_handler: {str(e)}")
raise
For those using gevent, there is an issue here being tracked on their side for that: https://github.com/gevent/gevent/issues/1826. This issue appears to be specific to RHEL-based systems. Please note that we do not provide or officially support gevent with our networking setup. Any issues related to gevent will need to be addressed by the gevent team.
I also faced same issue ,But this can be fix using
import gevent.monkey
gevent.monkey.patch_all()
Thanks, Everyone
This thread was helpful for debugging this issue, so I'm posting my team's context and solution to this problem.
We encountered this issue after updating packages in a flask application that uses gunicorn to launch "gevent workers" on python 3.10.
The issue appears to have been caused by gevent monkey patching occurring too late after the application python process was started. Gunicorn itself has a built in warning log for this that looks like
/usr/local/lib/python3.10/site-packages/gunicorn/workers/ggevent.py:38: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/usr/local/lib/python3.10/site-packages/urllib3/util/ssl_.py)'
We'd seen this warning in the past without it causing problems, but with newly updated packages we ran into this issue when downloads files from s3 using boto3.
Two ways to fix this. One was to follow the advice in this close gunicorn github issue and NOT use a gunicorn.py
config file and instead pass configs as params to the gunicorn process in our entrypoint script. The solution we ended up going with was to monkey patch gevent at the start of our config script, which we didn't previously realize ran in the same python process as the workers.
import gevent.monkey
gevent.monkey.patch_all()
# Monkey patching need to happen here before anything else.
# Gunicorn automatically monkey patches the worker processes when using gevent workers.
# But the way it does this does not strongly guarantee that the monkey patching will
# happen before this file loads, which can cause issues with core libraries like SSL.
import multiprocessing # noqa: E402
Outside of the gunicorn I think there are two paths to try to debug this:
Path 1) You know you're already using gevent to monkey patch
$ python
Python 3.10.13 (main, May 16 2024, 15:17:11) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
>>> import multiprocessing
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'multiprocessing']
Path 2) You don't think you're using gevent at all.
- Try running the snippet of code below to verify that some 3rd party application isn't using gevent without your knowing it. If something is, search your libs for whatever is causing the problem and either replace the problematic library, try putting it at the very top of your imports, or run gevent monkey patching yourself before importing *ANYTHING*
- If you're definitely not using gevent at all, then some other bug entirely is causing this issue with boto3
```python
from gevent.monkey import is_module_patched
...
# Place this where it makes sense for your application
if is_module_patched("socket"): # Socket will VERY LIKELY be patched by any lib using gevent
raise RuntimeError("Gevent was already monkey patched")
else:
logging.info("Gevent was NOT monkeypatched")
Describe the bug
I need help with this recursion error
maximum recursion depth exceeded
from boto3. This occurs when I initialize an s3 client in my inference script to allow me read s3 objects. Your insights will be deeply appreciated! Similar issue was posted on stackoverflow 2 months ago here: https://stackoverflow.com/questions/77786275/aws-sagemaker-endpoint-maximum-recursion-depth-exceeded-error-when-calling-botoHere is the relevant code block responsible for the error:
Expected Behavior
The s3 client created to enable access to the s3 objects
Current Behavior
Here is the full error log:
Reproduction Steps
simply initializing an s3 client within an inference script like so:
Possible Solution
No response
Additional Information/Context
No response
SDK version used
1.34.55
Environment details (OS name and version, etc.)
Sagemaker endpoint for Tensorflow serving