Trying to deploy pretrained MXNet model for inference only

hubenjm commented 6 years ago

I have a model that I've trained in MXNet to classify images, and I already have the model assets saved as model.tar.gz in an s3 bucket.

from sagemaker.mxnet.model import MXNetModel
import sagemaker
import sys
from sagemaker import get_execution_role
role = get_execution_role()
sagemaker_model = MXNetModel(model_data = 's3://bucket-name/model.tar.gz', 
entry_point='entry_point.py', #entry_point.py is an empty .py file since we aren't using for training
role = role)
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

I just want to be able to deploy this within a sagemaker notebook to a host and then call the predictor.predict function on an input image. However, the above sagemaker_model.deploy call fails and yields following error message:

ValueErrorTraceback (most recent call last)

in () ----> 1 predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge') /home/ec2-user/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/sagemaker/model.pyc in deploy(self, initial_instance_count, instance_type, endpoint_name) 90 production_variant = sagemaker.production_variant(model_name, instance_type, initial_instance_count) 91 self.endpoint_name = endpoint_name or model_name ---> 92 self.sagemaker_session.endpoint_from_production_variants(self.endpoint_name, [production_variant]) 93 if self.predictor_cls: 94 return self.predictor_cls(self.endpoint_name, self.sagemaker_session) /home/ec2-user/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/sagemaker/session.pyc in endpoint_from_production_variants(self, name, production_variants, wait) 512 self.sagemaker_client.create_endpoint_config( 513 EndpointConfigName=name, ProductionVariants=production_variants) --> 514 return self.create_endpoint(endpoint_name=name, config_name=name, wait=wait) 515 516 def expand_role(self, role): /home/ec2-user/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/sagemaker/session.pyc in create_endpoint(self, endpoint_name, config_name, wait) 344 self.sagemaker_client.create_endpoint(EndpointName=endpoint_name, EndpointConfigName=config_name) 345 if wait: --> 346 self.wait_for_endpoint(endpoint_name) 347 return endpoint_name 348 /home/ec2-user/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/sagemaker/session.pyc in wait_for_endpoint(self, endpoint, poll) 405 if status != 'InService': 406 reason = desc.get('FailureReason', None) --> 407 raise ValueError('Error hosting endpoint {}: {} Reason: {}'.format(endpoint, status, reason)) 408 return desc 409 ValueError: Error hosting endpoint sagemaker-mxnet-py2-cpu-2018-03-22-20-10-57-938: Failed Reason: The primary container for production variant AllTraffic did not pass the ping health check. I believe my attempt at using an empty file for the entry_point.py script is the reason this happened. But the problem in that case is that nowhere in the documentation was it clear to me what exactly should be in the entry_point.py script in the case that I just want to perform inference and no training with this model. My other question is related to what the predictor.predict function actually expects. Do I need to pass it a numpy array? Is there a way to pass a string for the image_url instead and then write some simple image preprocessing script that loads the image, resizes, etc on the host before calling the mxnet model.predict function? I'm concerned that opencv is not part of the endpoint environment by default. Any help with this would be much appreciated.

djarpin commented 6 years ago

Thanks @hubenjm .

I'm guessing the empty entry_point.py file is a problem here. The Python SDK README has more documentation about what functions are expected to serve an MXNet model. In general, this is model_fn, input_fn, predict_fn, and output_fn. However, there are default implementations that kick in if you don't specify them. I'm guessing the default isn't suited to your particular use case. You may need to check your CloudWatch logs to get a finer grained understanding of the high-level "ping health check" error.

Would it be helpful to have links to the Python SDK documentation in this repository? Or had you already found that, but just couldn't find the serving information you needed in there? In which case, we'd be open to feedback on how we make that clearer.

The predictor.predict function is quite flexible. You can set predictor.serializer to a function which will serialize your data before passing to the endpoint. And then you can use input_fn within the entry_point file to further manipulate the data before passing to your network for inference. This could include resizing or loading from an image_url (although there may be latency implications for that).

You can pip install from your entry_point script to add libraries that are not installed by default in the container. e.g.

import pip
pip.main(['install', '<my_package_name_here>'])
import <my_package_name_here>

Thanks, and hope this helps.

hubenjm commented 6 years ago

Thanks for that information. I was essentially trying to follow along with this example: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/mxnet_mnist_byom/mxnet_mnist.ipynb but I wasn't sure if it would work since the provided mnist.py file doesn't include anything about how inference is handled, hence my confusion.

Anyway, looking at the Python SDK readme, it seems that https://github.com/aws/sagemaker-python-sdk/blob/master/README.rst#deploying-endpoints-from-model-data is most relevant to my intended use case. It seems no sample transform_script.py is provided, but I'm guessing it would have to include an input_fn, predict_fn, output_fn and model_fn as you mentioned?

djarpin commented 6 years ago

Thanks, @hubenjm . You raise a good point that we should be more explicit about defining the functions and discussing what's going on in the MXNet BYOM example. I've added that to our backlog.

There are two parts to BYOM hosting using MXNet and the Python SDK:

The first is setting up your entry point script so that the endpoint can execute the functions needed to successfully do inference. That's the "Model loading" section of the wiki that the link in my previous response directs to. And yes, that's where you'd need to define the model_fn, input_fn, and/or predict_fn functions. If you don't define them, simple defaults will be used. This functionality is specific to MXNet.
Then the second part is the code you write locally to create a model object (which points to an artifact in S3) that you can .deploy(). That's the section that your link directs to. The section is for MXNet, but the same type of functionality exists for TensorFlow, built-in algorithms, and bring your own container.

hubenjm commented 6 years ago

Just as an update, I figured out that the problem was actually how I packaged the model.tar.gz file. Apparently the script in the example doesn't pack it correctly, or at least I wasn't able to unpack the resulting .tar.gz on my local machine. This code is located here and goes as follows:

import os
import json
os.mkdir('model')

model.save_checkpoint('model/model', 0000)
with open ( 'model/model-shapes.json', "w") as shapes:
    json.dump([{"shape": model.data_shapes[0][1], "name": "data"}], shapes)

import tarfile
def flatten(tarinfo):
    tarinfo.name = os.path.basename(tarinfo.name)
    return tarinfo

tar = tarfile.open("model.tar.gz", "w:gz")
tar.add("model", filter=flatten)
tar.close()

I ended up just manually packing the files myself from the command line.

The other problem I found was that the model files must be named precisely as "model-0000.params", "model-shapes.json", and "model-symbol.json", respectively. Any other names for these files will lead to the ping health check error from what I can tell.

Once this was figured out my original approach of just passing an empty entry_point.py file when calling MXNetModel worked fine. I obtained an MXNetPredictor object named predictor by either calling the deploy function from the MXNetModel object or by calling predictor = MXNetPredictor(endpoint_name = endpoint_name, sagemaker_session = sagemaker_session) using an existing endpoint name. Either way, then I could perform inference in one of two ways:

Method 1:

endpoint_name = predictor.endpoint  # predictor is an MXNetPredictor object
runtime = boto3.Session().client(service_name='runtime.sagemaker',region_name='us-east-1')
j = json.dumps(img.tolist())  # img is a np.ndarray object with shape (1,3,224,224)
response = runtime.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/json', Body=j, Accept='application/json')
result = response['Body'].read()
result = np.array(json.loads(result)).ravel()

Method 2: (much simpler)

result = np.array(predictor.predict(img.tolist())).ravel() # img is np.ndarray object with shape (1,3,224,224)

With regards to Method 1 above, it wasn't obvious to me where to find out that I had to use the 'application/json' content_type when invoking the endpoint. Nor was it obvious where to find how the image should be serialized properly.

yangaws commented 6 years ago

@hubenjm

Hi, thanks for using sagemaker and providing your knowledge to the community!

I want to add one more comment to your solution. As @djarpin pointed out, sagemaker python sdk allows user to define some specific functions in the entry point script. Specially to this model loading problem, model_fn() is what you need.

If this model_fn() is not provided and the default model_fn() is invoked, then as you have already found, you need to package the model according to the requirement on file structure and naming.

But if you provide your own model_fn() that defines how your model should be loaded. Then you don't need to worry much about the format. Models will be loaded your way.

djarpin commented 6 years ago

Closing for now. Feel free to re-open if you continue to experience problems. Thanks.

aws / amazon-sagemaker-examples

Trying to deploy pretrained MXNet model for inference only #216