UnexpectedStatusException: Error hosting endpoint xxxxx: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.. #3062
I am using the example from the notebooks to create and deploy an endpoint to AWS SageMaker Cloud. I have passed all the checks locally and when I attempt to deploy the endpoint I run into the issue.
Describe the bug and Logs
UnexpectedStatusException: Error hosting endpoint sagemaker-xgboost: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..
Full Traceback from the cloudwatch logs:
Traceback (most recent call last): File "/miniconda3/bin/serve", line 8, in <module> sys.exit(serving_entrypoint()) File "/miniconda3/lib/python3.6/site-packages/sagemaker_xgboost_container/serving.py", line 128, in serving_entrypoint server.start(env.ServingEnv().framework_module) File "/miniconda3/lib/python3.6/site-packages/sagemaker_containers/_server.py", line 86, in start _modules.import_module(env.module_dir, env.module_name) File "/miniconda3/lib/python3.6/site-packages/sagemaker_containers/_modules.py", line 253, in import_module _files.download_and_extract(uri, _env.code_dir) File "/miniconda3/lib/python3.6/site-packages/sagemaker_containers/_files.py", line 129, in download_and_extract s3_download(uri, dst) File "/miniconda3/lib/python3.6/site-packages/sagemaker_containers/_files.py", line 165, in s3_download s3.Bucket(bucket).download_file(key, dst) File "/miniconda3/lib/python3.6/site-packages/boto3/s3/inject.py", line 246, in bucket_download_file ExtraArgs=ExtraArgs, Callback=Callback, Config=Config) File "/miniconda3/lib/python3.6/site-packages/boto3/s3/inject.py", line 172, in download_file extra_args=ExtraArgs, callback=Callback) File "/miniconda3/lib/python3.6/site-packages/boto3/s3/transfer.py", line 307, in download_file future.result() File "/miniconda3/lib/python3.6/site-packages/s3transfer/futures.py", line 106, in result return self._coordinator.result() File "/miniconda3/lib/python3.6/site-packages/s3transfer/futures.py", line 265, in result raise self._exception File "/miniconda3/lib/python3.6/site-packages/s3transfer/tasks.py", line 255, in _main self._submit(transfer_future=transfer_future, **kwargs) File "/miniconda3/lib/python3.6/site-packages/s3transfer/download.py", line 343, in _submit **transfer_future.meta.call_args.extra_args File "/miniconda3/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call return self._make_api_call(operation_name, kwargs) File "/miniconda3/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call raise error_class(parsed_response, operation_name)
--
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
To reproduce
In my local notebook (my personal machine NOT sagemaker notebook):
import pandas
import xgboost
from xgboost import XGBRegressor
import numpy as np
from sklearn.model_selection import train_test_split, RandomizedSearchCV
print(xgboost.__version__)
1.0.1
# read data
df = pd.read_csv('')
# split df into train and test
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,0:21], df.iloc[:,-1], test_size=0.1)
# Encode categorical variables
cat_vars = [List of categorical variables]
cat_transform = ColumnTransformer([('cat', OneHotEncoder(handle_unknown='ignore'), cat_vars)], remainder='passthrough')
encoder = cat_transform.fit(X_train)
X_train = encoder.transform(X_train)
X_test = encoder.transform(X_test)
X_train.shape
(2000,100)
# xgboost regression model
model = XGBRegressor(objective = 'reg:squarederror')
# Parameter distributions
params = {
xxxxx: xxx
...
...
}
# Hyperparameter tuning
r = RandomizedSearchCV(model, param_distributions=params, n_iter=10, scoring="neg_mean_absolute_error", cv=3, verbose=1, n_jobs=1, return_train_score=True, error_score='raise')
# Fit model
r.fit(X_train.toarray(), y_train.values)
xgbest = r.best_estimator
# AWS SageMaker Endpoint code
import boto3
import pickle
import sagemaker
from sagemaker.amazon.amazon_estimator import get_image_uri
from time import gmtime, strftime
region = boto3.Session().region_name
role = 'arn:aws:iam::111:role/xxx-sagemaker-role'
bucket = 'ml-model'
prefix = "sagemaker/xxx-xgboost-byo"
bucket_path = "https://s3-{}.amazonaws.com/{}".format('us-west-1', 'ml-model')
client = boto3.client(
's3',
aws_access_key_id=xxx
aws_secret_access_key=xxx
)
client.list_objects(Bucket=bucket)
Save the model
# save the model, either xgbest
model_file_name = "xgboost-model"
# using save_model
# xgb_model.save_model(model_file_name)
pickle.dump(xgbest, open(model_file_name, 'wb'))`
!tar czvf xgboost_model.tar.gz $model_file_name
Upload to S3
key = 'xgboost_model.tar.gz'
with open('xgboost_model.tar.gz', 'rb') as f:
client.upload_fileobj(f, bucket, key)
Import model
# Import model into hosting
container = get_image_uri(boto3.Session().region_name, "xgboost", "0.90-2")
print(container)
xxxxxx.dkr.ecr.us-west-1.amazonaws.com/sagemaker-xgboost:0.90-2-cpu-py3
%%time
model_name = model_file_name + datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
model_url = "https://s3-{}.amazonaws.com/{}/{}".format(region, bucket, key)
from sagemaker.xgboost import XGBoost, XGBoostModel
from sagemaker.session import Session
from sagemaker.local import LocalSession
sm_client = boto3.client(
"sagemaker",
region_name="us-west-1",
aws_access_key_id='xxxx',
aws_secret_access_key='xxxx'
)
# Define session
sagemaker_session = Session(sagemaker_client = sm_client)
models3_uri = "s3://ml-model/xgboost_model.tar.gz"
xgb_inference_model = XGBoostModel(
model_data=models3_uri,
role=role,
entry_point="inference.py",
framework_version="0.90-2",
# Cloud
sagemaker_session = sagemaker_session
# Local
# sagemaker_session = None
)
#serializer = StringSerializer(content_type="text/csv")
predictor = xgb_inference_model.deploy(
initial_instance_count = 1,
# Cloud
instance_type="ml.t2.large",
# Local
# instance_type = "local",
serializer = "text/csv"
)
if xgb_inference_model.sagemaker_session.local_mode == True:
print('Deployed endpoint in local mode')
else:
print('Deployed endpoint to SageMaker AWS Cloud')
/Applications/Anaconda/anaconda3/lib/python3.9/site-packages/sagemaker/session.py in wait_for_endpoint(self, endpoint, poll)
3354 if status != "InService":
3355 reason = desc.get("FailureReason", None)
-> 3356 raise exceptions.UnexpectedStatusException(
3357 message="Error hosting endpoint {endpoint}: {status}. Reason: {reason}.".format(
3358 endpoint=endpoint, status=status, reason=reason
UnexpectedStatusException: Error hosting endpoint sagemaker-xgboost-xxxx: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..
If applicable, add logs to help explain your problem.
You may also attach an .ipynb file to this issue if it includes relevant logs or output.
Link to the notebook https://github.com/aws/amazon-sagemaker-examples/blob/master/advanced_functionality/xgboost_bring_your_own_model/xgboost_bring_your_own_model.ipynb
https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/xgboost_script_mode_local_training_and_serving/code/inference.py
I am using the example from the notebooks to create and deploy an endpoint to AWS SageMaker Cloud. I have passed all the checks locally and when I attempt to deploy the endpoint I run into the issue.
Describe the bug and Logs
UnexpectedStatusException: Error hosting endpoint sagemaker-xgboost: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..
Full Traceback from the cloudwatch logs:To reproduce
In my local notebook (my personal machine NOT sagemaker notebook):
Save the model
Upload to S3
Import model
If applicable, add logs to help explain your problem. You may also attach an
.ipynb
file to this issue if it includes relevant logs or output.