MultiDataModel error during prediction: Please provide a model_fn implementation.

Describe the bug When deploying a packaged PyTorch model using the PyTorchModel class I can successfully deploy and call the predict function, but as soon as I use the same model and pass it to a MultiDataModel class, the deployment process goes through, but when I call predictor.predict(data=data, target_model='model.tar.gz') I get the following error:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.".

I'm not sure if the error is related to the 'Please provide a model_fn implementation.' error I get in cloudwatch, but the model_fn function is in actually implemented and MultiDataModel somehow doesn't load it.

To reproduce

create a sample PyTorch model, train and package it.
deploy the model using PyTorchModel: (This will successfully deploy the model and when calling predictor.predict() successfully returns the inference results.

class InvoiceExtraction(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super().__init__(endpoint_name, sagemaker_session=sagemaker_session, serializer=json_serializer, 
                         deserializer=json_deserializer, content_type='application/json')

model = PyTorchModel(model_data=str('/home/ec2-user/SageMaker/model.tar.gz'),
                   name=name_from_base(MODEL_NAME),
                   role=role, 
                   entry_point='predictor.py',
                   framework_version='1.5.0', # Breaks for 1.6.0
                   py_version='py3',
                   predictor_cls=InvoiceExtraction)

predictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge', endpoint_name=ENDPOINT_NAME)
predicted_value = predictor.predict(data=data)

if you deploy the model using a MultiDataModel instead, it will get deployed, but the predict function returns the error mentioned above.

model_data_prefix = 's3://multi-model-endpoint-models/'
model.sagemaker_session = sagemaker_session # not setting this results in the model's session not being initialized
mme = MultiDataModel(name=MODEL_NAME,
                     model_data_prefix=model_data_prefix,
                     model=model,# passing our pytorch model
                     sagemaker_session=sagemaker_session)

ENDPOINT_INSTANCE_TYPE = 'ml.m4.xlarge'
ENDPOINT_NAME = 'test-endpoint'

predictor = mme.deploy(initial_instance_count=1,
                       instance_type=ENDPOINT_INSTANCE_TYPE,
                       endpoint_name=ENDPOINT_NAME)

mme.add_model(model_data_source='/home/ec2-user/SageMaker/model.tar.gz', model_data_path='model.tar.gz')
list(mme.list_models())

predicted_value = predictor.predict(data=data, target_model='model.tar.gz')

Expected behavior MultiDataModel should deploy and work without any errors.

Screenshots or logs This is what's included in the CloudWatch logs:

2021-02-02 20:24:54,652 [INFO ] W-9000-2093075ac497ff81bd6238817 com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1

2021-02-02 20:24:54,653 [WARN ] W-9000-2093075ac497ff81bd6238817 com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker thread exception.

java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n:

Please provide a model_fn implementation.

See documentation for model_fn at https://github.com/aws/sagemaker-python-sdk

011at io.netty.handler.codec.http.HttpResponseStatus.(HttpResponseStatus.java:555)

011at io.netty.handler.codec.http.HttpResponseStatus.(HttpResponseStatus.java:537)

011at io.netty.handler.codec.http.HttpResponseStatus.valueOf(HttpResponseStatus.java:465)

011at com.amazonaws.ml.mms.wlm.Job.response(Job.java:85)

011at com.amazonaws.ml.mms.wlm.BatchAggregator.sendResponse(BatchAggregator.java:85)

011at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:146)

011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

011at java.util.concurrent.FutureTask.run(FutureTask.java:266)

011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

011at java.lang.Thread.run(Thread.java:748)

2021-02-02 20:24:54,655 [ERROR] W-9000-2093075ac497ff81bd6238817 com.amazonaws.ml.mms.wlm.BatchAggregator - Unexpected job: 99ef86e2-aedf-47d7-8f6c-950fde1bec88

System information A description of your system. Please provide:

Toolkit version: Latest
Framework version: tried both on 1.5.0 and 1.6.0
Python version: 3.6
CPU or GPU: CPU
Custom Docker image (Y/N): N

Additional context Add any other context about the problem here.

Your model need to access your inference code during invocations!

for example, pay attention to these examples: pytorch multi-model example

**Note:** To directly use training job `model.tar.gz` outputs as we do here, you'll need to make sure your training job produces results that:  

    - Already include any required inference code in a `code/` subfolder, and\n",
    - (If you're using SageMaker PyTorch containers v1.6+) have been packaged to be compatible with TorchServe.\n",

See the `enable_sm_oneclick_deploy()` and `enable_torchserve_multi_model()` functions in [src/train.py](src/train.py) for notes on this. Alternatively, you can perform the same steps after the fact - to produce a new, serving-ready `model.tar.gz` from your raw training job result."

sklearn multi-model example

# pay attention to code_location argument!!
    estimator = SKLearn(
        entry_point=TRAINING_FILE,  # script to use for training job
        role=role,
        source_dir=SOURCE_DIR,  # Location of scripts
        instance_count=1,
        instance_type=TRAIN_INSTANCE_TYPE,
        framework_version="1.2-1",  # 1.2-1 is the latest version
        output_path=s3_output_path,  # Where to store model artifacts
        base_job_name=_job,
        code_location=code_location,  # This is where the .tar.gz of the source_dir will be stored
        metric_definitions=[{"Name": "median-AE", "Regex": "AE-at-50th-percentile: ([0-9.]+).*$"}],
        hyperparameters={"n-estimators": 100, "min-samples-leaf": 3, "model-name": location},
    )

there are many ways to make you codes accessible and I bring you two of them for you :) I hope it is useful

aws / sagemaker-pytorch-inference-toolkit

MultiDataModel error during prediction: Please provide a model_fn implementation. #92

011at io.netty.handler.codec.http.HttpResponseStatus.(HttpResponseStatus.java:555)

011at io.netty.handler.codec.http.HttpResponseStatus.(HttpResponseStatus.java:537)

011at io.netty.handler.codec.http.HttpResponseStatus.valueOf(HttpResponseStatus.java:465)

011at com.amazonaws.ml.mms.wlm.Job.response(Job.java:85)

011at com.amazonaws.ml.mms.wlm.BatchAggregator.sendResponse(BatchAggregator.java:85)

011at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:146)

011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

011at java.util.concurrent.FutureTask.run(FutureTask.java:266)

011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

011at java.lang.Thread.run(Thread.java:748)