Default handler behaves differently in inference.py and python_service.py

Describe the bug

To reproduce

I have a model and endpoint that works correctly using the default handler provided in python_service.py.

When I copy the default handler into an inference.py file and change it's name to handler, the endpoint behaves differently.

In particular, the response Body becomes

JSON Value: [ [ ... data removed  ... ] ] Is not object

When the inference.py script is not provided in the model artifacts, the default handler runs and returns the model's expected output.

Expected behavior I expected the default handler to perform the same even if it was provided in the inference.py script.

System information A description of your system. Please provide:

Toolkit version: 2.31.1
Framework version: TensorFlow 2.3
Python version: 3.6
CPU or GPU: GPU/T4 on a g4dn.xlarge
Custom Docker image (Y/N): N. Using image 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.3-gpu

Additional context

This is the contents of my inference.py file. My intention was to reproduce the default behavior from the default handler in python_service.py. This is the version copied from inside the actual container, though I believe it matches what's currently in the repo as well.

import json
import requests

def handler(data, context):
    data = data.read().decode("utf-8")
    if not isinstance(data, str):
        data = json.loads(data)
    response = requests.post(context.rest_uri, data=data)
    return response.content, context.accept_header

The default handler as implemented in python_service.py is:

def default_handler(data, context):
    """A default inference request handler that directly send post request to TFS rest port with
    un-processed data and return un-processed response

    :param data: input data
    :param context: context instance that contains tfs_rest_uri
    :return: inference response from TFS model server
    """
    data = data.read().decode("utf-8")
    if not isinstance(data, str):
        data = json.loads(data)
    response = requests.post(context.rest_uri, data=data)
    return response.content, context.accept_header

aws / sagemaker-tensorflow-serving-container

Default handler behaves differently in inference.py and python_service.py #207