Open flacout opened 3 years ago
Hey,
Could you try to use model dir like this and see if it worked:
|--[model1]
|--[model_version_number]
|--assets
|--variables
|--saved_model.pb
|--code
|--inference.py
|--requirements.txt
Thanks
Changing the model directory from OP to the suggested:
|--[model_version_number]
|--assets
|--variables
|--saved_model.pb
|--code
|--lib
|--external_module
|--inference.py
to this:
|--[model1]
|--[model_version_number]
|--assets
|--variables
|--saved_model.pb
|--code
|--inference.py
|--requirements.txt
Results in: No versions of servable {id} found under base path /opt/ml/models/{id}/model
I confirm @richardimms comment. Deploying a model with structure:
|--[model1]
|--[model_version_number]
|--assets
|--variables
|--saved_model.pb
|--code
|--inference.py
|--requirements.txt
Lead to an error after invoking the endpoint:
response = runtime_sm_client.invoke_endpoint(EndpointName='multi-model-endpoint',
ContentType='application/json',
TargetModel='model1.tar.gz',
Body=json.dumps(data))
Error:
Could not find base path /opt/ml/models/55dfd27a9d81013319c5f5d01600e748/model for servable 55dfd27a9d81013319c5f5d01600e748
Any updates on this issue. This is a blocker for using Multi-Model containers in production.
Seconded, any update on this @jinpengqi?
Sry for the late. Let me help ping related oncall for a dive deeper here.
Any updates? Would be good to get an ETA so we can plan accordingly. @jinpengqi
Hey,
I have cut a tt to related team for this issue, they will take a look and give updates here.
@jinpengqi any further update on this?
@flacout and @richardimms can you post the code you used to create the model.
Hello @hsl89 do you mean the actual saved model from Tensorflow, or the code to construct the model package we upload to S3?
If it's the saved model from Tensorflow then I just use the export_saved_model
function provided by the estimator here: https://github.com/tensorflow/estimator/blob/master/tensorflow_estimator/python/estimator/estimator.py#L659-L736
The serving function is based on TensorFlow Ranking and is: https://github.com/tensorflow/ranking/blob/e0c009e378ac3b3a79fe260e90b20a93aca6901f/tensorflow_ranking/python/data.py#L1091-L1135
Sorry I can't share exact snippets, but it follows those two patterns.
It's probably worth mentioning, I can use Sagemaker single model endpoint with a primary container, using the same model.
@flacout and @richardimms can you post the code you used to create the model.
@hsl89 Here is the code to create the multi-model endpoint on sagemaker:
import sagemaker
import boto3
import json
import numpy as np
#create the model
sm_client = boto3.client('sagemaker')
role = sagemaker.get_execution_role()
image_name = # image created from "sagemaker-tensorflow-serving-container" repository
container = { 'Image': image_name,
'ModelDataUrl': 's3://somebucket/multi-models/',
'Mode': 'MultiModel'
}
response = sm_client.create_model(
ModelName = 'multi-model',
ExecutionRoleArn = role,
Containers = [container])
# create endpoint config
response = sm_client.create_endpoint_config(
EndpointConfigName = "multi-model-config",
ProductionVariants=[{
'InstanceType': 'ml.m4.xlarge',
'InitialInstanceCount': 1,
'ModelName': 'multi-model',
'VariantName': 'AllTraffic'}])
# create endpoint
response = sm_client.create_endpoint(
EndpointName = 'multi-model-endpoint',
EndpointConfigName = "multi-model-config")
Here is the code to package and deploy one tensorflow model:
# train a tensorflow model
model = # some keras model
# save model
model_version = '1'
export_dir = 'export/Servo/' + model_version
tf.saved_model.save(model, export_dir)
# Create archive
model_name = "model_1.tar.gz"
with tarfile.open(model_name, mode='w:gz') as archive:
archive.add(f'export/Servo/{model_version}', arcname=model_version, recursive=True)
archive.add(f'inference.py', arcname="code/inference.py")
archive.add(f'postprocessing.py', arcname="code/lib/local_module.py")
# upload model-archive to s3
s3.upload_file(Filename=model_name,
Bucket='bucket_name',
Key=os.path.join("multi_model_folder", model_name))
# test the model is deployed
runtime_sm_client = boto3.client('runtime.sagemaker')
endpoint_name = 'tf-multi-model-endpoint'
signal = [1,2,3,4,4]
data = {"instances": signal.tolist()}
response = runtime_sm_client.invoke_endpoint(EndpointName=endpoint_name,
ContentType='application/json',
TargetModel=model_name,
Body=json.dumps(data))
Here is the inference.py
code:
def handler(data, context):
if context.request_content_type == 'application/json':
input_data = data.read().decode('utf-8')
input_data = json.loads(input_data)
processed_input, signal = _process_input(input_data)
response = requests.post(context.rest_uri, data=processed_input)
else:
raise ValueError('{{"error": "unsupported content type {}"}}'.format(
context.request_content_type or "unknown"))
return _process_output(input_data, signal, response, context)
def _process_input(input_data):
#do preprocessing
signal = input_data['signal']
data = {"instances": signal}
return json.dumps(data), signal
def _process_output(input_data, signal, response, context):
if response.status_code != 200:
raise ValueError(response.content.decode('utf-8'))
response_content_type = context.accept_header
prediction = response.content.decode('utf-8')
prediction = np.array(json.loads(prediction)["predictions"])
# do postprocessing
return json.dumps({"predictions": output}), response_content_type
Let me know if you need some clarification about the code. I hope the problem can be resolved promptly. Thanks
Hey @richardimms, I was looking for the code you used to create the multi-model endpoint. the code provided by @flacout is what I am looking for. will post updates once I found problems or workaround
Hey @flacout, the issue you are facing is not endpoint creation failure, it is the inference.py
customized for each model is not being used by tensorflow_model_server right?
@hsl89 yes the problem is with inference.py
. The endpoint is created fine.
If I deploy a model without inference.py
: everything is fine, the model respond normally to prediction call.
If I deploy a model with inference.py
: there is an error saying the model cannot be found (when trying a prediction). I assume the inference code is not picked-up correctly resulting in an exception and the model not being deployed.
I am using a later version of inference container and I can get a multi-endpoint, however it does not look like the preprcessing code is being used.
import sagemaker
sess = sagemaker.Session()
region = sess.boto_region_name
image = "763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:2.6.0-cpu-py38-ubuntu20.04"
multi_model_prefix = "s3://<my bucket name>/multi_model_folder/"
aws s3 ls s3://<my bucket name>/multi_model_folder/
2021-11-03 00:15:20 2295181 model_1.tar.gz
2021-11-03 00:15:20 2294161 model_2.tar.gz
container = {
'Image': image,
'ModelDataUrl': multi_model_prefix,
'Mode': 'MultiModel'
}
import boto3
sm_client = boto3.client('sagemaker')
res = sm_client.create_model(
ModelName='example-mm',
ExecutionRoleArn = <a sagemaker role>,
Containers = [container]
)
res = sm_client.create_endpoint_config(
EndpointConfigName='example-mm-config',
ProductionVariants = [
{
'InstanceType': 'ml.m4.xlarge',
'InitialInstanceCount': 1,
'InitialVariantWeight': 1,
'ModelName' : 'example-mm',
'VariantName': 'AllTraffic'
}]
)
res = sm_client.create_endpoint(
EndpointName='example-mm-endpoint',
EndpointConfigName='example-mm-config'
)
Trigger model1
sm_runtime = boto3.client('sagemaker-runtime')
res = sm_runtime.invoke_endpoint(
EndpointName="example-mm-endpoint",
ContentType='application/json',
TargetModel = 'model_1.tar.gz',
Body = body
)
Trigger model2
res = sm_runtime.invoke_endpoint(
EndpointName="example-mm-endpoint",
ContentType='application/json',
TargetModel = 'model_2.tar.gz',
Body = body
)
The structure of model_1.tar.gz
is
├── 1
│ ├── assets
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00001
│ └── variables.index
└── code
├── inference.py
The code inside code/inference.py
is
import json
def input_handler(data, context):
print('========= preprocessing input for model 1=============')
if context.request_content_type=='application/json':
d = data.read().decode('utf-8')
return d if len(d) else ''
# raise if input is not json
raise ValueError("unsupported data type: {}".format(
context.request_content_type))
def output_handler(data, context):
if data.status_code != 200:
raise ValueError(data.content.decode('utf-8'))
response_content_type = context.accept_header
prediction=data.content
return prediction, response_content_type
The endpoint can be triggered and I can confirm the outputs are different for different model. But I don't see input_handler
being used, i.e. its print statement does not appear in cloudwatch.
@flacout just saw your response, on my end, the endpoint can find the model, but inference.py
cannot be used
I'm facing the same problem. It seems deploying multi-model endpoint with preprocessing script is not support with tensorflow serving model as yet. Any update on this issue?
Thanks
I'm facing the same problem. It seems deploying multi-model endpoint with preprocessing script is not support with tensorflow serving model as yet. Any update on this issue?
Thanks
Same problem buddy !
My workaround was to use the multi-container endpoint feature of Sagemaker, it is not meant to be used for that but it does the job so far
Hi @flacout and @richardimms
Please try the below workaround to use the universal inference.py
IMAGE_URI = `<docker_image_uri>`
custom_env = {
SAGEMAKER_MULTI_MODEL_UNIVERSAL_BUCKET : mybucket,
SAGEMAKER_MULTI_MODEL_UNIVERSAL_PREFIX : cv-models/mme/code/
} #The Prefix should end with a "/" delimiter
Please try the suggested workaround to unblock your workflows.
@satishpasumarthi thank you for looking into this issue. I have some doubt about your solution, I think it refer to the creation of a model in sagemaker? Could you check, and correct if necessary my code example below to make sure this is what you intended:
import sagemaker
import boto3
#create the model
sm_client = boto3.client('sagemaker')
role = sagemaker.get_execution_role()
IMAGE_URI = `<docker_image_uri>`
custom_env = {
SAGEMAKER_MULTI_MODEL_UNIVERSAL_BUCKET : mybucket,
SAGEMAKER_MULTI_MODEL_UNIVERSAL_PREFIX : cv-models/mme/code/
} #The Prefix should end with a "/" delimiter
container = { 'Image': IMAGE_URI,
'ModelDataUrl': 's3://mybucket/multi-models/',
'Mode': 'MultiModel'
'Environment': custom_env,
}
response = sm_client.create_model(
ModelName = 'multi-model',
ExecutionRoleArn = role,
Containers = [container])
Thanks
@satishpasumarthi thank you for looking into this issue. I have some doubt about your solution, I think it refer to the creation of a model in sagemaker? Could you check, and correct if necessary my code example below to make sure this is what you intended:
import sagemaker import boto3 #create the model sm_client = boto3.client('sagemaker') role = sagemaker.get_execution_role() IMAGE_URI = `<docker_image_uri>` custom_env = { SAGEMAKER_MULTI_MODEL_UNIVERSAL_BUCKET : mybucket, SAGEMAKER_MULTI_MODEL_UNIVERSAL_PREFIX : cv-models/mme/code/ } #The Prefix should end with a "/" delimiter container = { 'Image': IMAGE_URI, 'ModelDataUrl': 's3://mybucket/multi-models/', 'Mode': 'MultiModel' 'Environment': custom_env, } response = sm_client.create_model( ModelName = 'multi-model', ExecutionRoleArn = role, Containers = [container])
Thanks
Hi @flacout , Your example looks good. Please give it a try.
Hi @satishpasumarthi , I tried your fix, unfortunately the sagemaker model endpoint failed to be created, with the error below:
Let me know if you have a resolution for this issue. Thanks
Hi @flacout , thanks for your feedback. My changes have nothing to do with the nginx server and I haven't encountered it at my end. It worked fine when I tested at my end by deploying 2 models at an endpoint. Can you provide a sample reproducible testcase and the base container you are using?
Let me know if I got it straight:
I used the code in your branch mme-fix
: https://github.com/satishpasumarthi/sagemaker-tensorflow-serving-container/tree/mme-fix
This is how I build the docker image:
cp -r docker/build_artifacts/* docker/2.1/
docker build \
--cache-from 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.1-cpu \
--build-arg TFS_VERSION=2.1 \
--build-arg TFS_SHORT_VERSION=2.1 \
-f docker/2.1/Dockerfile.cpu \
-t tensorflow-inference:2.1.3-cpu \
docker/2.1/
After that I push it to our ECR registry. And how I deployed the endpoint:
import sagemaker
import boto3
sm_client = boto3.client('sagemaker')
role = sagemaker.get_execution_role()
image_prepro="258317088977.dkr.ecr.us-east-1.amazonaws.com/tensorflow-serving-luna:tfs-2.1.3-cpu"
custom_env = {
"SAGEMAKER_MULTI_MODEL_UNIVERSAL_BUCKET" : 'cmdsk-dvc',
"SAGEMAKER_MULTI_MODEL_UNIVERSAL_PREFIX" : "cv-models/mme/code/"
} #The Prefix should end with a "/" delimiter
container = { 'Image': image_prepro,
'ModelDataUrl': 's3://cmdsk-dvc/multi-models-prepro/',
'Mode': 'MultiModel',
'Environment': custom_env
}
response = sm_client.create_model(
ModelName = 'peak-detection-multi-model-prepro',
ExecutionRoleArn = role,
Containers = [container])
response = sm_client.create_endpoint_config(
EndpointConfigName = "peak-detection-multi-model-prepro-config",
ProductionVariants=[{
'InstanceType': 'ml.t2.large',
'InitialInstanceCount': 1,
#'InitialVariantWeight': 1,
'ModelName': 'peak-detection-multi-model-prepro',
'VariantName': 'AllTraffic'}])
response = sm_client.create_endpoint(
EndpointName = 'peak-detection-multi-model-prepro-endpoint',
EndpointConfigName = "peak-detection-multi-model-prepro-config")
The inference.py at this location: s3://cmdsk-dvc/cv-models/mme/code/inference.py
is just a dummy file to try it, returning an empty list:
import os
import sys
import json
import time
import boto3
import requests
def handler(data, context):
"""Handle request.
Args:
data (obj): the request data
context (Context): an object containing request and configuration details
Returns:
(bytes, string): data to return to client, (optional) response content type
"""
response_content_type = context.accept_header
return json.dumps({"predictions": []}), response_content_type
How can I share the base container? Let me know if you need anything else.
Thanks @flacout , Let me try and get back to you.
The thing I'd just like to confirm is the expected structure of the buckets now. Previously it was:
|--[model1]
|--[model_version_number]
|--assets
|--variables
|--saved_model.pb
|--code
|--inference.py
|--requirements.txt
This was all .tar.gz and then unpacked, but it seems like @flacout is adding the code to a seperate "folder" within the S3 bucket. Is the expectation then to not have the code inside the .tar.gz file anymore?
@satishpasumarthi I found the origin of the nginx bug. A new version of nginx-module-njs was released the 28 December 2021 (njs 0.7.1), it doesn't support the js_include directive anymore. See these references: https://github.com/nginxinc/docker-nginx/issues/615 http://nginx.org/en/docs/njs/changes.html#njs0.7.1
The fix that I found is to force the install of the previous njs version (0.7.0) in the dockerfile, example docker/2.1/Dockerfile.cpu
:
# nginx + njs
RUN apt-get update \
&& apt-get -y install --no-install-recommends \
curl \
gnupg2 \
ca-certificates \
git \
wget \
vim \
build-essential \
zlib1g-dev \
&& curl -s http://nginx.org/keys/nginx_signing.key | apt-key add - \
&& echo 'deb http://nginx.org/packages/ubuntu/ bionic nginx' >> /etc/apt/sources.list \
&& apt-get update \
&& apt-get -y install --no-install-recommends \
nginx=1.20.2-1~bionic \
nginx-module-njs=1.20.2+0.7.0-1~bionic \
python3 \
python3-pip \
python3-setuptools \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
Next week I will look at your proposed solution for the inference.py
Hi @flacout , We have just merged a PR to fix the nginx.conf.template
to accomodate changes for njs upgrade.
https://github.com/aws/sagemaker-tensorflow-serving-container/pull/217
Just wanted to check with you if you had gotten a chance to try the workaround suggested earlier? Thanks !
Hi @satishpasumarthi , Sorry for the late reply, yes the workaround is working! One limitation that I see is that if the user wants to apply changes to the inference.py file he has to start a new endpoint for the changes to be consider. Besides that comment, the response time is good, multiple models can be deployed, and the pre-postprocessing code is applied to the request. Thanks
NB: To work with my configuration I had to change the function serve._download_scripts()
I check the size of the files listed in the s3 "folder".
If I don't do that the first item on the list is the name of the "folder" itself (witch is empty in size) and that raise an exception later.
def _download_scripts(self, bucket, prefix):
log.info("checking boto session region ...")
boto_session = boto3.session.Session()
boto_region = boto_session.region_name
if boto_region in ("us-iso-east-1", "us-gov-west-1"):
raise ValueError("Universal scripts is not supported in us-iso-east-1 or us-gov-west-1")
log.info("downloading universal scripts ...")
client = boto3.client("s3")
resource = boto3.resource("s3")
# download files
paginator = client.get_paginator("list_objects")
for result in paginator.paginate(Bucket=bucket, Delimiter="/", Prefix=prefix):
for file in result.get("Contents", []):
if file.get("Size") > 0: # this is the fix here
destination = os.path.join(CODE_DIR, file.get("Key"))
if not os.path.exists(os.path.dirname(destination)):
os.makedirs(os.path.dirname(destination))
resource.meta.client.download_file(bucket, file.get("Key"), destination)
Hi @flacout , Thanks for your reply.
Isn't this always the case of redeploying if the inference.py
changed?
@flacout Gentle reminder !
Hi @satishpasumarthi , In the original documentation (readme of the repo) it seems that in a multi model deployment each model could have their own inference.py file associated. But this might not be the best design actually, I guess it depends on the situation.
The thing I'd just like to confirm is the expected structure of the buckets now. Previously it was:
|--[model1] |--[model_version_number] |--assets |--variables |--saved_model.pb |--code |--inference.py |--requirements.txt
This was all .tar.gz and then unpacked, but it seems like @flacout is adding the code to a seperate "folder" within the S3 bucket. Is the expectation then to not have the code inside the .tar.gz file anymore?
I noticed this issue is still open. I am experiencing the same issue: my deployed multi-model doesn't invoke inference.py.
I couldn't invoke my deployed multi-model at all using the directory structure above for my two models. The only way I could find to successfully invoke a model was to use the following structure for my .tar.gz file:
|--[model_version_number]
|--assets
|--variables
|--saved_model.pb
|--code
|--inference.py
I can successfully invoke a model via the multi-model endpoint that way, but it doesn't execute the code in inference.py.
Also, I create the main model:
sm_model = sagemaker.tensorflow.model.TensorFlowModel(
entry_point = "inference.py",
model_data = f"s3://{bucket}/{prefix}/{ranking_archive}",
role = role,
framework_version = tf_framework_version,
source_dir = "export/ranking/code", # SageMaker is looking for this directory in my local notebook, not in the tar.gz file.
sagemaker_session = sess)
I get an error if I specify source_dir = "code"
(reflecting the structure of the .tar.gz file), because SageMaker is looking for the source_dir
directory in my local notebook, not in the tar.gz file.
BTW, is anyone actively working this issue?
I'll also note that the documentation contradicts itself.
Just under the Pre/Post-Processing heading, it states:
NOTE: There is currently no support for pre-/post-processing with multi-model containers.
But then later, toward the bottom of the same documentation, it states:
Using Multi-Model Endpoint with Pre/Post-Processing Multi-Model Endpoint can be used together with Pre/Post-Processing. Each model will need its own inference.py otherwise default handlers will be used.
Any update on this?. I am able to deploy endpoint but the inference.py is not called
Can someone please confirm if the pre/post processing works with Sagemaker Multi Model Container Image tensor flow : 2.5 - CPU ?
Docker Image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.5-cpu
Bug Description Hello, I built the image from the latest commit: 6a51a604618e613d7c6d04d520c8264fe4e69d88 I pushed it in ECR and deployed a multi-model Sagemaker endpoint. The endpoint is working well for a simple model, but when I try to upload a model that contain
code/inference.py
, the new model cannot load the input and output handlers and therefore return an input error when I call the model.I tried to built a previous version: commit: 3bab56e5c5104cee7703d06565ac4d93dcab3ea8 "change: update MME Pre/Post-Processing model and script paths (#153)" when the feature of pre-postprocessing was first introduce to the container, and this time it works perfectly. I can deploy models with
code/inference.py
, install new pip packages with thecode/requirement.txt
and add local python modules in thecode/lib
folder.It seems to me that this feature of having models with independent pre-postprocessing capability was lost somehow in the latest commits of this repository, is this something intentional, or it was an error of manipulation, or am I missing something?
To reproduce The model directory structure I used looks like this, following the readme guidelines:
For the
inference.py
I used the implement ofhandler
instead of theinput_handler
andoutput_handler
pair.Thanks Fabrice