Inference Failure for MXNet MME Deployment in SageMaker

swlkn commented 3 years ago

Describe the bug When deploying a multi model endpoint using the SageMaker MXNet container (which uses this library) I receive the following error when I try to invoke the endpoint: "AttributeError: 'NoneType' object has no attribute 'transform'.

To reproduce Train a simple MXNet model in SageMaker using the example here.

At the end of the notebook, you verify that you can receive real time predictions from a SageMaker Endpoint using the model/code.

Next, add the following code to the notebook to deploy the same model as a multi model endpoint:

mme = MultiDataModel(name='mxnet-mme-final', model_data_prefix='<enter your own bucket + prefix>',
                                              model=model, sagemaker_session=sagemaker_session)

mme.deploy(initial_instance_count=1, instance_type='ml.t2.medium')

Copy the model artifact to the model_data_prefix so it's accessible by MME.

Once it finishes deploying, try to invoke it using sagemaker_runtime in boto3, you will get the same error.

Expected behavior I'd expect to be able to deploy the same model to MME as I can as a single endpoint using the SageMaker MXNet container. The SageMaker documentation says that MXNet container supports MME.

CloudWatch Endpoint Logs at time of failure

2021-02-11 19:25:28,443 [INFO ] pool-1-thread-1 ACCESS_LOG - /10.32.0.2:49282 "GET /ping HTTP/1.1" 200 1 2021-02-11 19:25:29,687 [WARN ] W-9000-29028cdaefc80ac4662997191 com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-29028cdaefc80ac4662997191 2021-02-11 19:25:29,807 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_mxnet_serving_container.handler_service --model-path /opt/ml/models/29028cdaefc80ac4662997191ec53602/model --model-name 29028cdaefc80ac4662997191ec53602 --preload-model false --tmp-dir /home/model-server/tmp 2021-02-11 19:25:29,809 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000 2021-02-11 19:25:29,809 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 69 2021-02-11 19:25:29,809 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started. 2021-02-11 19:25:29,809 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.6.10 2021-02-11 19:25:29,809 [INFO ] epollEventLoopGroup-3-1 com.amazonaws.ml.mms.wlm.ModelManager - Model 29028cdaefc80ac4662997191ec53602 loaded. 2021-02-11 19:25:29,818 [INFO ] W-9000-29028cdaefc80ac4662997191 com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000 2021-02-11 19:25:29,833 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000. 2021-02-11 19:25:31,055 [INFO ] W-9000-29028cdaefc80ac4662997191-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Model 29028cdaefc80ac4662997191ec53602 loaded io_fd=86fe85fffe5a7fee-00000026-00000002-7cbbb181288e375a-efae222a 2021-02-11 19:25:31,057 [INFO ] W-9000-29028cdaefc80ac4662997191 com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1195 2021-02-11 19:25:31,058 [INFO ] W-9000-29028cdaefc80ac4662997191 ACCESS_LOG - /10.32.0.2:49282 "POST /models HTTP/1.1" 200 1400 2021-02-11 19:25:31,061 [WARN ] W-9000-29028cdaefc80ac4662997191 com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-29028cdaefc80ac4662997191-1 2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Invoking custom service failed. 2021-02-11 19:25:31,106 [INFO ] W-9000-29028cdaefc80ac4662997191 com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 2 2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last): 2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/usr/local/lib/python3.6/site-packages/mms/service.py", line 108, in predict 2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ret = self._entry_point(input_batch, self.context) 2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/usr/local/lib/python3.6/site-packages/sagemaker_inference/default_handler_service.py", line 50, in handle 2021-02-11 19:25:31,106 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - return self._service.transform(data, context) 2021-02-11 19:25:31,107 [INFO ] W-29028cdaefc80ac4662997191-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - AttributeError: 'NoneType' object has no attribute 'transform' 2021-02-11 19:25:31,109 [INFO ] W-9000-29028cdaefc80ac4662997191 ACCESS_LOG - /10.32.0.2:49282 "POST /models/29028cdaefc80ac4662997191ec53602/invoke HTTP/1.1" 503 14

System information A description of your system. Please provide:

Toolkit version: Latest? The version that is installed in the SageMaker container.
Framework version: 1.6.0
Python version: 3.6
CPU or GPU: CPU
Custom Docker image (Y/N): N

Additional context I think I may know the cause of the bug.

Looking in to the handler_service.py code, the initialize() method retrieves the appropriate Transformer object and sets it to self._service. It then calls the parent class constructor, but it doesn't pass in the transformer object. The parent constructor accepts a transformer as a parameter (see here), but the parent class defaults the value of self._service to None, so when handle() is called by MMS, it uses the self._service set by the parent class, resulting in the error shown above.

If you look at SageMaker containers that successfully deploy MME, they pass the Transformer() in to the default handler constructor.

https://github.com/aws/sagemaker-mxnet-inference-toolkit/blob/d180ab69c809498cfcc8bc11e06c2e55a650999e/src/sagemaker_mxnet_serving_container/handler_service.py#L64-L79

sontran commented 3 years ago

Is this fixed? I ran in to the same problem

SJWilkinson commented 3 years ago

Is this fixed? I ran in to the same problem

I don't think it's fixed yet. There has been some work done on the problem, see here: https://github.com/aws/sagemaker-mxnet-inference-toolkit/pull/143

davidthomas426 commented 1 year ago

The problem is here: https://github.com/aws/sagemaker-mxnet-inference-toolkit/blob/048d7a785a5e7fb15e69cc3744ede0a3f61406aa/src/sagemaker_mxnet_serving_container/handler_service.py#L80

That line tries to set PYTHONPATH to prepend the model's code_dir, so its files can be imported by the default handler. However, this code is run as part of the handler inside the model server's worker process. Setting os.environ['PYTHONPATH'] from python at runtime does not change the module search path for that python program.

This is only a problem for MME because for Single Model Endpoints, the toolkit sets up the PYTHONPATH before running the model server in a subprocess. That does have the intended effect, since the environment variable is set when the model server worker process's python interpreter starts up. See:

https://github.com/aws/sagemaker-inference-toolkit/blob/fff41909455ce79adc9e57a36f7f52ef143a8bf6/src/sagemaker_inference/model_server.py#L137-L147
https://github.com/aws/sagemaker-mxnet-inference-toolkit/blob/addf27925e3f7d51c05be05a31d16a64d88839f9/src/sagemaker_mxnet_serving_container/serving.py#L49
- MXNet toolkit delegates to base toolkit's start_model_server function, which calls the base toolkit's _set_python_path function linked above

The PyTorch and HuggingFace toolkits have separately fixed this:

However, the PyTorch toolkit appends it to the end of sys.path rather than prepending. I've opened a separate issue to track that: https://github.com/aws/sagemaker-pytorch-inference-toolkit/issues/137

The base inference toolkit also has this issue: https://github.com/aws/sagemaker-inference-toolkit/blob/fff41909455ce79adc9e57a36f7f52ef143a8bf6/src/sagemaker_inference/default_handler_service.py#L59-L64

Also by the way, gotta love this unit test: https://github.com/aws/sagemaker-inference-toolkit/blob/3774c1a0fb4408cfa95333b75d6e30a376bffa52/test/unit/test_model_server.py#L138-L144. It tests that the environment variable is set (and it is), but setting the environment variable doesn't have the effect the code actually needs it to have.

aws / sagemaker-mxnet-inference-toolkit

Inference Failure for MXNet MME Deployment in SageMaker #135