Closed Dhruv-reviv closed 2 months ago
Thanks for reaching out. The SSL validation failed...
error has been reported several times in the past. I recommend looking through those issues. This troubleshooting section in the AWS CLI User Guide also highlights some possible causes (both the CLI and Boto3 use Botocore under the hood, so the troubleshooting steps apply to both):
The AWS CLI doesn't trust your proxy's certificate.
Your configuration isn't pointing to the correct CA root certificate location.
Your configuration isn't using the correct AWS Region.
Your TLS version needs to be updated
If you're still seeing an issue, please share a complete but code snippet to reproduce the problem and debug logs (with sensitive info redacted) by adding boto3.set_stream_logger('')
to your script.
Hi @tim-finnigan I have already tried various hacks/tricks from other SSL validation issues before opening a new one. I also tried to downgrade boto3's version to 1.28.63 which someone suggested to be working, however being in the sagemaker environment, I am not being really able to activate a new environment I created and also in another way around, downgrading from notebook itself is not working as it still picks up the original version.
According to your suggestion I added boto3.set_stream_logger('')
to my script, however on executing the
response = predictor.predict(serialized_data)
, kernel died with the logging message as attached in the image.
Here's the code for you to work with ` import logging import boto3 import pickle import sagemaker from sagemaker.pytorch import PyTorchModel
boto3.set_stream_logger()
model_bucket = 'Bucket Name' model_key = 'Model-Structure/model.tar.gz'
with open("interaction.pkl", 'rb') as f: data1 = CPU_Unpickler(f).load() with open("auxiliary.pkl", 'rb') as f: data2 = CPU_Unpickler(f).load()
serialized_data = pickle.dumps({ 'data1': data1, 'data2': data2 })
pytorch_model = PyTorchModel( model_data=f's3://{model_bucket}/{model_key}', # S3 URI for the model artifacts role=role, # IAM role with necessary permissions entry_point='inference.py', # Path to the inference script framework_version='1.13.1', # PyTorch version py_version='py39', # Python version sagemaker_session=sagemaker_session # SageMaker session )
predictor = pytorch_model.deploy( instance_type='ml.m5.xlarge', # Instance type initial_instance_count=1 # Number of instances )
response = predictor.predict(serialized_data)
print(response) `
Everything executes till the response command.
I feel I kind of know what the error implicitly points at. The test data was too big and hence the error, I feel. However, even taking 1% of data, I am receiving a different error. If someone wants to help and/or has any idea on how to approach, please welcome. The code and inference.py still remains the same,
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from primary with message "Your invocation timed out while waiting for a response from container primary. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/pytorch-inference-2024-06-17-14-40-29-350 in account 953765082453 for more information.
I saw a related issue for the Python Sagemaker SDK: https://github.com/aws/sagemaker-python-sdk/issues/1119 and documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-troubleshooting.html.
You can configure timeouts/retries as documented here in Boto3, but it looks like this a limitation imposed by the Sagemaker service. Again if you want us to review further please share logs which you can get by adding boto3.set_stream_logger('')
to your script.
Hey Tim, Thanks for your email. I did check out the issue and all the surrounding ideas/implementations but nothing helping. I adapted async inference to try to get more time and data capacity. However it ran for an hour (which is the capacity of async inference) and the kernel died off. I was testing for very limited data maybe 50 MBs tops and still couldn't get any inferences out.
Thanks, Dhruv
On Wed, Jun 19, 2024 at 4:57 PM Tim Finnigan @.***> wrote:
I saw a related issue for the Python Sagemaker SDK: aws/sagemaker-python-sdk#1119 https://github.com/aws/sagemaker-python-sdk/issues/1119 and documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-troubleshooting.html .
You can configure timeouts/retries as documented here https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html in Boto3, but it looks like this a limitation imposed by the Sagemaker service. Again if you want us to review further please share logs which you can get by adding boto3.set_stream_logger('') to your script.
— Reply to this email directly, view it on GitHub https://github.com/boto/boto3/issues/4162#issuecomment-2179447416, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIXVTVW2FEEKH2JYNHBA54DZIHWDHAVCNFSM6AAAAABJEOCQ3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZZGQ2DONBRGY . You are receiving this because you authored the thread.Message ID: @.***>
Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.
any update on this thread @Dhruv-reviv
Hi @RwGrid, I do not have any further updates. Additionally, I stopped working on the same and modified the requirements.
Describe the bug
I am trying to get inference from a deployed pretrained model on Sagemaker notebook environment. While executing the below line of code,
response = predictor.predict(serialized_data)
I am receiving an SSL error as below,
SSLError: SSL validation failed for https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/pytorch-inference-2024-06-11-13-39-50-210/invocations EOF occurred in violation of protocol (_ssl.c:2426)
Expected Behavior
I should be receiving the response as I tested it manually in the notebook environment without deploying the model and just loading the weights from S3 bucket and using the same piece of code for inference.
Code:
input_data = (interaction_data, mt_data) results = predict_fn(input_data, model)
Result:
Iteration at 0: auc 0.964, map 0.439
Current Behavior
Assuming that Model gets executed properly as defined below,
pytorch_model = PyTorchModel(model_data=f's3://{model_bucket}/{model_key}', role=role, entry_point='inference.py', framework_version='1.8.1', # Specify the PyTorch version py_version='py3', sagemaker_session=sagemaker_session)
and the deployment takes place properly as,predictor = pytorch_model.deploy(instance_type='ml.m5.large', initial_instance_count=1)
While executing
response = predictor.predict(serialized_data)
,I receive the below error,
SSLError: SSL validation failed for https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/pytorch-inference-2024-06-11-13-39-50-210/invocations EOF occurred in violation of protocol (_ssl.c:2426)
Reproduction Steps
Define your data path
interaction_data = "s3://path_to_pkl/interaction.pkl" auxiliary_data = "s3://path_to_pkl/auxiliary.pkl"
Define Model bucketmodel_bucket = 'path_to_model_bucket' model_key = 'Model-Structure/model.tar.gz'
Arrange your data properly,
with open("interaction.pkl", 'rb') as f: data1 = CPU_Unpickler(f).load()
with open("auxiliary.pkl", 'rb') as f: data2= CPU_Unpickler(f).load()
serialized_data = pickle.dumps({ 'data1': data1, 'data2': data2 })
Define your model,pytorch_model = PyTorchModel(model_data=f's3://{model_bucket}/{model_key}', role=role, entry_point='inference.py', framework_version='1.8.1', # Specify the PyTorch version py_version='py3', sagemaker_session=sagemaker_session)
Deploy the modelpredictor = pytorch_model.deploy(instance_type='ml.m5.large', initial_instance_count=1)
Get the responseresponse = predictor.predict(serialized_data)
I also have an inference.py which I am using for model evaluation and obtaining the response, a general practice for sagemaker environment.
Possible Solution
No response
Additional Information/Context
No response
SDK version used
1.34.101
Environment details (OS name and version, etc.)
AWS Sagemaker Jupyter Notebook