aws / sagemaker-tensorflow-serving-container

A TensorFlow Serving solution for use in SageMaker. This repo is now deprecated.
Apache License 2.0
174 stars 101 forks source link

Pre-Postprocessing feature seems to not work anymore. #212

Open flacout opened 3 years ago

flacout commented 3 years ago

Bug Description Hello, I built the image from the latest commit: 6a51a604618e613d7c6d04d520c8264fe4e69d88 I pushed it in ECR and deployed a multi-model Sagemaker endpoint. The endpoint is working well for a simple model, but when I try to upload a model that contain code/inference.py, the new model cannot load the input and output handlers and therefore return an input error when I call the model.

I tried to built a previous version: commit: 3bab56e5c5104cee7703d06565ac4d93dcab3ea8 "change: update MME Pre/Post-Processing model and script paths (#153)" when the feature of pre-postprocessing was first introduce to the container, and this time it works perfectly. I can deploy models with code/inference.py, install new pip packages with the code/requirement.txt and add local python modules in the code/lib folder.

It seems to me that this feature of having models with independent pre-postprocessing capability was lost somehow in the latest commits of this repository, is this something intentional, or it was an error of manipulation, or am I missing something?

To reproduce The model directory structure I used looks like this, following the readme guidelines:

|--[model_version_number]
    |--assets
    |--variables
    |--saved_model.pb
|--code
    |--lib
        |--external_module
    |--inference.py

For the inference.py I used the implement of handler instead of the input_handler and output_handler pair.

Thanks Fabrice

jinpengqi commented 3 years ago

Hey,

Could you try to use model dir like this and see if it worked:

|--[model1]
  |--[model_version_number]
      |--assets
      |--variables
      |--saved_model.pb
|--code
    |--inference.py
    |--requirements.txt

Thanks

richardimms commented 3 years ago

Changing the model directory from OP to the suggested:

|--[model_version_number]
    |--assets
    |--variables
    |--saved_model.pb
|--code
    |--lib
        |--external_module
    |--inference.py

to this:

|--[model1]
  |--[model_version_number]
      |--assets
      |--variables
      |--saved_model.pb
|--code
    |--inference.py
    |--requirements.txt

Results in: No versions of servable {id} found under base path /opt/ml/models/{id}/model

flacout commented 3 years ago

I confirm @richardimms comment. Deploying a model with structure:

|--[model1]
  |--[model_version_number]
      |--assets
      |--variables
      |--saved_model.pb
|--code
    |--inference.py
    |--requirements.txt

Lead to an error after invoking the endpoint:

response = runtime_sm_client.invoke_endpoint(EndpointName='multi-model-endpoint',
                                    ContentType='application/json',
                                    TargetModel='model1.tar.gz',
                                    Body=json.dumps(data))

Error: Could not find base path /opt/ml/models/55dfd27a9d81013319c5f5d01600e748/model for servable 55dfd27a9d81013319c5f5d01600e748

maheshpatil2021 commented 3 years ago

Any updates on this issue. This is a blocker for using Multi-Model containers in production.

richardimms commented 3 years ago

Seconded, any update on this @jinpengqi?

jinpengqi commented 3 years ago

Sry for the late. Let me help ping related oncall for a dive deeper here.

maheshpatil2021 commented 3 years ago

Any updates? Would be good to get an ETA so we can plan accordingly. @jinpengqi

jinpengqi commented 3 years ago

Hey,

I have cut a tt to related team for this issue, they will take a look and give updates here.

richardimms commented 2 years ago

@jinpengqi any further update on this?

hongshanli23 commented 2 years ago

@flacout and @richardimms can you post the code you used to create the model.

richardimms commented 2 years ago

Hello @hsl89 do you mean the actual saved model from Tensorflow, or the code to construct the model package we upload to S3?

If it's the saved model from Tensorflow then I just use the export_saved_model function provided by the estimator here: https://github.com/tensorflow/estimator/blob/master/tensorflow_estimator/python/estimator/estimator.py#L659-L736

The serving function is based on TensorFlow Ranking and is: https://github.com/tensorflow/ranking/blob/e0c009e378ac3b3a79fe260e90b20a93aca6901f/tensorflow_ranking/python/data.py#L1091-L1135

Sorry I can't share exact snippets, but it follows those two patterns.

It's probably worth mentioning, I can use Sagemaker single model endpoint with a primary container, using the same model.

flacout commented 2 years ago

@flacout and @richardimms can you post the code you used to create the model.

@hsl89 Here is the code to create the multi-model endpoint on sagemaker:

import sagemaker
import boto3
import json
import numpy as np

#create the model
sm_client = boto3.client('sagemaker')
role = sagemaker.get_execution_role()

image_name = # image created from "sagemaker-tensorflow-serving-container" repository
container = { 'Image':        image_name,
              'ModelDataUrl': 's3://somebucket/multi-models/',
              'Mode':         'MultiModel'
            }
response = sm_client.create_model(
              ModelName        = 'multi-model',
              ExecutionRoleArn = role,
              Containers       = [container])

# create endpoint config
response = sm_client.create_endpoint_config(
    EndpointConfigName = "multi-model-config",
    ProductionVariants=[{
        'InstanceType':        'ml.m4.xlarge',
        'InitialInstanceCount': 1,
        'ModelName':            'multi-model',
        'VariantName':          'AllTraffic'}])

# create endpoint
response = sm_client.create_endpoint(
              EndpointName       = 'multi-model-endpoint',
              EndpointConfigName = "multi-model-config")

Here is the code to package and deploy one tensorflow model:

# train a tensorflow model
model = # some keras model

# save model
model_version = '1'
export_dir = 'export/Servo/' + model_version
tf.saved_model.save(model, export_dir)

# Create archive
model_name = "model_1.tar.gz"
with tarfile.open(model_name, mode='w:gz') as archive:
    archive.add(f'export/Servo/{model_version}', arcname=model_version, recursive=True)
    archive.add(f'inference.py', arcname="code/inference.py")
    archive.add(f'postprocessing.py', arcname="code/lib/local_module.py")

# upload model-archive to s3
s3.upload_file(Filename=model_name, 
               Bucket='bucket_name', 
               Key=os.path.join("multi_model_folder", model_name))

# test the model is deployed
runtime_sm_client = boto3.client('runtime.sagemaker')
endpoint_name = 'tf-multi-model-endpoint'
signal = [1,2,3,4,4]
data = {"instances": signal.tolist()}
response = runtime_sm_client.invoke_endpoint(EndpointName=endpoint_name,
                                    ContentType='application/json',
                                    TargetModel=model_name,
                                    Body=json.dumps(data))

Here is the inference.py code:

def handler(data, context):
    if context.request_content_type == 'application/json':
        input_data = data.read().decode('utf-8')
        input_data = json.loads(input_data)

        processed_input, signal = _process_input(input_data)
        response = requests.post(context.rest_uri, data=processed_input)

    else:
        raise ValueError('{{"error": "unsupported content type {}"}}'.format(
            context.request_content_type or "unknown"))

    return _process_output(input_data, signal, response, context)

def _process_input(input_data):
    #do preprocessing
    signal = input_data['signal']
    data = {"instances": signal}

    return json.dumps(data), signal 

def _process_output(input_data, signal, response, context):
    if response.status_code != 200:
        raise ValueError(response.content.decode('utf-8'))

    response_content_type = context.accept_header
    prediction = response.content.decode('utf-8')
    prediction = np.array(json.loads(prediction)["predictions"]) 
    # do postprocessing
    return json.dumps({"predictions": output}), response_content_type

Let me know if you need some clarification about the code. I hope the problem can be resolved promptly. Thanks

hongshanli23 commented 2 years ago

Hey @richardimms, I was looking for the code you used to create the multi-model endpoint. the code provided by @flacout is what I am looking for. will post updates once I found problems or workaround

hongshanli23 commented 2 years ago

Hey @flacout, the issue you are facing is not endpoint creation failure, it is the inference.py customized for each model is not being used by tensorflow_model_server right?

flacout commented 2 years ago

@hsl89 yes the problem is with inference.py. The endpoint is created fine.

If I deploy a model without inference.py: everything is fine, the model respond normally to prediction call. If I deploy a model with inference.py: there is an error saying the model cannot be found (when trying a prediction). I assume the inference code is not picked-up correctly resulting in an exception and the model not being deployed.

hongshanli23 commented 2 years ago

I am using a later version of inference container and I can get a multi-endpoint, however it does not look like the preprcessing code is being used.

import sagemaker 
sess = sagemaker.Session()
region = sess.boto_region_name
image = "763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:2.6.0-cpu-py38-ubuntu20.04"

multi_model_prefix = "s3://<my bucket name>/multi_model_folder/"
aws s3 ls s3://<my bucket name>/multi_model_folder/
2021-11-03 00:15:20    2295181 model_1.tar.gz
2021-11-03 00:15:20    2294161 model_2.tar.gz
container = {
    'Image': image,
    'ModelDataUrl': multi_model_prefix,
    'Mode': 'MultiModel'
}

import boto3
sm_client = boto3.client('sagemaker')
res = sm_client.create_model(
    ModelName='example-mm',
    ExecutionRoleArn = <a sagemaker role>,
    Containers = [container]
)

res = sm_client.create_endpoint_config(
    EndpointConfigName='example-mm-config',
    ProductionVariants = [
        {
            'InstanceType': 'ml.m4.xlarge',
             'InitialInstanceCount': 1,
             'InitialVariantWeight': 1,
             'ModelName' : 'example-mm',
             'VariantName': 'AllTraffic'
        }]
)

res = sm_client.create_endpoint(
    EndpointName='example-mm-endpoint',
    EndpointConfigName='example-mm-config'
)

Trigger model1

sm_runtime = boto3.client('sagemaker-runtime')
res = sm_runtime.invoke_endpoint(
    EndpointName="example-mm-endpoint",
    ContentType='application/json',
    TargetModel = 'model_1.tar.gz',
    Body = body
)

Trigger model2

res = sm_runtime.invoke_endpoint(
    EndpointName="example-mm-endpoint",
    ContentType='application/json',
    TargetModel = 'model_2.tar.gz',
    Body = body
)

The structure of model_1.tar.gz is

├── 1
│   ├── assets
│   ├── saved_model.pb
│   └── variables
│       ├── variables.data-00000-of-00001
│       └── variables.index
└── code
    ├── inference.py

The code inside code/inference.py is

import json

def input_handler(data, context):
    print('========= preprocessing input for model 1=============')
    if context.request_content_type=='application/json':
        d = data.read().decode('utf-8')
        return d if len(d) else ''

    # raise if input is not json
    raise ValueError("unsupported data type: {}".format(
        context.request_content_type))

def output_handler(data, context):
    if data.status_code != 200:
        raise ValueError(data.content.decode('utf-8'))
    response_content_type = context.accept_header
    prediction=data.content
    return prediction, response_content_type

The endpoint can be triggered and I can confirm the outputs are different for different model. But I don't see input_handler being used, i.e. its print statement does not appear in cloudwatch.

hongshanli23 commented 2 years ago

@flacout just saw your response, on my end, the endpoint can find the model, but inference.py cannot be used

asoltanipanah commented 2 years ago

I'm facing the same problem. It seems deploying multi-model endpoint with preprocessing script is not support with tensorflow serving model as yet. Any update on this issue?

Thanks

wissambenhaddad commented 2 years ago

I'm facing the same problem. It seems deploying multi-model endpoint with preprocessing script is not support with tensorflow serving model as yet. Any update on this issue?

Thanks

Same problem buddy !

wissambenhaddad commented 2 years ago

My workaround was to use the multi-container endpoint feature of Sagemaker, it is not meant to be used for that but it does the job so far

satishpasumarthi commented 2 years ago

Hi @flacout and @richardimms

Please try the below workaround to use the universal inference.py


Please try the suggested workaround to unblock your workflows.
flacout commented 2 years ago

@satishpasumarthi thank you for looking into this issue. I have some doubt about your solution, I think it refer to the creation of a model in sagemaker? Could you check, and correct if necessary my code example below to make sure this is what you intended:

import sagemaker
import boto3

#create the model
sm_client = boto3.client('sagemaker')
role = sagemaker.get_execution_role()

IMAGE_URI = `<docker_image_uri>`
custom_env = {
                         SAGEMAKER_MULTI_MODEL_UNIVERSAL_BUCKET : mybucket,
                         SAGEMAKER_MULTI_MODEL_UNIVERSAL_PREFIX : cv-models/mme/code/
                         } #The Prefix should end with a "/" delimiter

container = { 'Image':        IMAGE_URI,
              'ModelDataUrl': 's3://mybucket/multi-models/',
              'Mode':         'MultiModel'
              'Environment': custom_env,
            }

response = sm_client.create_model(
              ModelName        = 'multi-model',
              ExecutionRoleArn = role,
              Containers       = [container])

Thanks

satishpasumarthi commented 2 years ago

@satishpasumarthi thank you for looking into this issue. I have some doubt about your solution, I think it refer to the creation of a model in sagemaker? Could you check, and correct if necessary my code example below to make sure this is what you intended:

import sagemaker
import boto3

#create the model
sm_client = boto3.client('sagemaker')
role = sagemaker.get_execution_role()

IMAGE_URI = `<docker_image_uri>`
custom_env = {
                         SAGEMAKER_MULTI_MODEL_UNIVERSAL_BUCKET : mybucket,
                         SAGEMAKER_MULTI_MODEL_UNIVERSAL_PREFIX : cv-models/mme/code/
                         } #The Prefix should end with a "/" delimiter

container = { 'Image':        IMAGE_URI,
              'ModelDataUrl': 's3://mybucket/multi-models/',
              'Mode':         'MultiModel'
              'Environment': custom_env,
            }

response = sm_client.create_model(
              ModelName        = 'multi-model',
              ExecutionRoleArn = role,
              Containers       = [container])

Thanks

Hi @flacout , Your example looks good. Please give it a try.

flacout commented 2 years ago

Hi @satishpasumarthi , I tried your fix, unfortunately the sagemaker model endpoint failed to be created, with the error below: image

Let me know if you have a resolution for this issue. Thanks

satishpasumarthi commented 2 years ago

Hi @flacout , thanks for your feedback. My changes have nothing to do with the nginx server and I haven't encountered it at my end. It worked fine when I tested at my end by deploying 2 models at an endpoint. Can you provide a sample reproducible testcase and the base container you are using?

flacout commented 2 years ago

Let me know if I got it straight:

I used the code in your branch mme-fix: https://github.com/satishpasumarthi/sagemaker-tensorflow-serving-container/tree/mme-fix

This is how I build the docker image:

cp -r docker/build_artifacts/* docker/2.1/

docker build \
    --cache-from 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.1-cpu \
    --build-arg TFS_VERSION=2.1 \
    --build-arg TFS_SHORT_VERSION=2.1 \
    -f docker/2.1/Dockerfile.cpu \
    -t tensorflow-inference:2.1.3-cpu \
    docker/2.1/

After that I push it to our ECR registry. And how I deployed the endpoint:

import sagemaker
import boto3

sm_client = boto3.client('sagemaker')
role = sagemaker.get_execution_role()
image_prepro="258317088977.dkr.ecr.us-east-1.amazonaws.com/tensorflow-serving-luna:tfs-2.1.3-cpu"

custom_env = {
                "SAGEMAKER_MULTI_MODEL_UNIVERSAL_BUCKET" : 'cmdsk-dvc',
                "SAGEMAKER_MULTI_MODEL_UNIVERSAL_PREFIX" : "cv-models/mme/code/"
            } #The Prefix should end with a "/" delimiter

container = { 'Image':        image_prepro,
              'ModelDataUrl': 's3://cmdsk-dvc/multi-models-prepro/',
              'Mode':         'MultiModel',
              'Environment': custom_env
            }

response = sm_client.create_model(
              ModelName        = 'peak-detection-multi-model-prepro',
              ExecutionRoleArn = role,
              Containers       = [container])

response = sm_client.create_endpoint_config(
    EndpointConfigName = "peak-detection-multi-model-prepro-config",
    ProductionVariants=[{
        'InstanceType':        'ml.t2.large',
        'InitialInstanceCount': 1,
        #'InitialVariantWeight': 1,
        'ModelName':            'peak-detection-multi-model-prepro',
        'VariantName':          'AllTraffic'}])

response = sm_client.create_endpoint(
              EndpointName       = 'peak-detection-multi-model-prepro-endpoint',
              EndpointConfigName = "peak-detection-multi-model-prepro-config")

The inference.py at this location: s3://cmdsk-dvc/cv-models/mme/code/inference.py is just a dummy file to try it, returning an empty list:

import os
import sys
import json
import time
import boto3
import requests

def handler(data, context):
    """Handle request.
    Args:
        data (obj): the request data
        context (Context): an object containing request and configuration details
    Returns:
        (bytes, string): data to return to client, (optional) response content type
    """
    response_content_type = context.accept_header
    return json.dumps({"predictions": []}), response_content_type

How can I share the base container? Let me know if you need anything else.

satishpasumarthi commented 2 years ago

Thanks @flacout , Let me try and get back to you.

richardimms commented 2 years ago

The thing I'd just like to confirm is the expected structure of the buckets now. Previously it was:

|--[model1]
  |--[model_version_number]
      |--assets
      |--variables
      |--saved_model.pb
|--code
    |--inference.py
    |--requirements.txt

This was all .tar.gz and then unpacked, but it seems like @flacout is adding the code to a seperate "folder" within the S3 bucket. Is the expectation then to not have the code inside the .tar.gz file anymore?

flacout commented 2 years ago

@satishpasumarthi I found the origin of the nginx bug. A new version of nginx-module-njs was released the 28 December 2021 (njs 0.7.1), it doesn't support the js_include directive anymore. See these references: https://github.com/nginxinc/docker-nginx/issues/615 http://nginx.org/en/docs/njs/changes.html#njs0.7.1

The fix that I found is to force the install of the previous njs version (0.7.0) in the dockerfile, example docker/2.1/Dockerfile.cpu:

# nginx + njs
RUN apt-get update \
 && apt-get -y install --no-install-recommends \
    curl \
    gnupg2 \
    ca-certificates \
    git \
    wget \
    vim \
    build-essential \
    zlib1g-dev \
 && curl -s http://nginx.org/keys/nginx_signing.key | apt-key add - \
 && echo 'deb http://nginx.org/packages/ubuntu/ bionic nginx' >> /etc/apt/sources.list \
 && apt-get update \
 && apt-get -y install --no-install-recommends \
    nginx=1.20.2-1~bionic \
    nginx-module-njs=1.20.2+0.7.0-1~bionic \
    python3 \
    python3-pip \
    python3-setuptools \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*

Next week I will look at your proposed solution for the inference.py

satishpasumarthi commented 2 years ago

Hi @flacout , We have just merged a PR to fix the nginx.conf.template to accomodate changes for njs upgrade. https://github.com/aws/sagemaker-tensorflow-serving-container/pull/217 Just wanted to check with you if you had gotten a chance to try the workaround suggested earlier? Thanks !

flacout commented 2 years ago

Hi @satishpasumarthi , Sorry for the late reply, yes the workaround is working! One limitation that I see is that if the user wants to apply changes to the inference.py file he has to start a new endpoint for the changes to be consider. Besides that comment, the response time is good, multiple models can be deployed, and the pre-postprocessing code is applied to the request. Thanks

NB: To work with my configuration I had to change the function serve._download_scripts() I check the size of the files listed in the s3 "folder". If I don't do that the first item on the list is the name of the "folder" itself (witch is empty in size) and that raise an exception later.

    def _download_scripts(self, bucket, prefix):
        log.info("checking boto session region ...")
        boto_session = boto3.session.Session()
        boto_region = boto_session.region_name
        if boto_region in ("us-iso-east-1", "us-gov-west-1"):
            raise ValueError("Universal scripts is not supported in us-iso-east-1 or us-gov-west-1")

        log.info("downloading universal scripts ...")
        client = boto3.client("s3")
        resource = boto3.resource("s3")
        # download files
        paginator = client.get_paginator("list_objects")
        for result in paginator.paginate(Bucket=bucket, Delimiter="/", Prefix=prefix):
            for file in result.get("Contents", []):
                if file.get("Size") > 0:  # this is the fix here
                    destination = os.path.join(CODE_DIR, file.get("Key"))
                    if not os.path.exists(os.path.dirname(destination)):
                        os.makedirs(os.path.dirname(destination))
                    resource.meta.client.download_file(bucket, file.get("Key"), destination)
satishpasumarthi commented 2 years ago

Hi @flacout , Thanks for your reply. Isn't this always the case of redeploying if the inference.py changed?

satishpasumarthi commented 2 years ago

@flacout Gentle reminder !

flacout commented 2 years ago

Hi @satishpasumarthi , In the original documentation (readme of the repo) it seems that in a multi model deployment each model could have their own inference.py file associated. But this might not be the best design actually, I guess it depends on the situation.

rlcauvin commented 2 years ago

The thing I'd just like to confirm is the expected structure of the buckets now. Previously it was:

|--[model1]
  |--[model_version_number]
      |--assets
      |--variables
      |--saved_model.pb
|--code
    |--inference.py
    |--requirements.txt

This was all .tar.gz and then unpacked, but it seems like @flacout is adding the code to a seperate "folder" within the S3 bucket. Is the expectation then to not have the code inside the .tar.gz file anymore?

I noticed this issue is still open. I am experiencing the same issue: my deployed multi-model doesn't invoke inference.py.

I couldn't invoke my deployed multi-model at all using the directory structure above for my two models. The only way I could find to successfully invoke a model was to use the following structure for my .tar.gz file:

|--[model_version_number]
      |--assets
      |--variables
      |--saved_model.pb
|--code
    |--inference.py

I can successfully invoke a model via the multi-model endpoint that way, but it doesn't execute the code in inference.py.

Also, I create the main model:

sm_model = sagemaker.tensorflow.model.TensorFlowModel(
  entry_point = "inference.py",
  model_data = f"s3://{bucket}/{prefix}/{ranking_archive}",
  role = role,
  framework_version = tf_framework_version,
  source_dir = "export/ranking/code", # SageMaker is looking for this directory in my local notebook, not in the tar.gz file. 
  sagemaker_session = sess)

I get an error if I specify source_dir = "code" (reflecting the structure of the .tar.gz file), because SageMaker is looking for the source_dir directory in my local notebook, not in the tar.gz file.

BTW, is anyone actively working this issue?

rlcauvin commented 2 years ago

I'll also note that the documentation contradicts itself.

Just under the Pre/Post-Processing heading, it states:

NOTE: There is currently no support for pre-/post-processing with multi-model containers.

But then later, toward the bottom of the same documentation, it states:

Using Multi-Model Endpoint with Pre/Post-Processing Multi-Model Endpoint can be used together with Pre/Post-Processing. Each model will need its own inference.py otherwise default handlers will be used.

deepatabc commented 1 year ago

Any update on this?. I am able to deploy endpoint but the inference.py is not called

philipakash commented 1 year ago

Can someone please confirm if the pre/post processing works with Sagemaker Multi Model Container Image tensor flow : 2.5 - CPU ?

Docker Image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.5-cpu