aws / sagemaker-inference-toolkit

Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Apache License 2.0
370 stars 82 forks source link

Pass context from handler service to handler function #109

Closed waytrue17 closed 1 year ago

waytrue17 commented 1 year ago

Issue #, if available: MMS and TorchServe store request information in context object and pass it to the handler service function. This PR further pass the context to the handler function so that users can use those information in the custom handler function.

One particular use case is multi-GPU inference for PyTorch and MXNet. Currently handler functions statically select the same GPU device and assign all data/model to that single device even on a multi-GPU host. MMS and TorchServe dynamically select GPU device and store that information in context. By passing context to handler function, users can choose to assign data/model to different GPUs and utilize all GPU resources.

Below is an example of input_fn that process and assign data to device:

## current handler function
def input_fn(self, input_data, content_type):
        ## data processing 
        np_array = decoder.decode(input_data, content_type)
        tensor = torch.FloatTensor(np_array) if content_type in content_types.UTF8_TYPES 
                    else torch.from_numpy(np_array)

        ## select device statically
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        ## copy data to device
        return tensor.to(device)
## new handler function with context
def input_fn(self, input_data, content_type, context):
        ## data processing 
        np_array = decoder.decode(input_data, content_type)
        tensor = torch.FloatTensor(np_array) if content_type in content_types.UTF8_TYPES 
                    else torch.from_numpy(np_array)

        ## select device dynamically
        device = torch.device("cuda:" + str(context.system_properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")

        ## copy data to device
        return tensor.to(device)

After this PR, handler functions with/without context will be both supported. So that it won’t break existing use cases.

input_fn(input_data, content_type)  # should still work

Description of changes:

Testing done:

With context passed, workers were assigned to different gpus as expected:

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    396466      C   /opt/conda/bin/python3.8         1659MiB |
|    1   N/A  N/A    396465      C   /opt/conda/bin/python3.8         1659MiB |
|    1   N/A  N/A    396467      C   /opt/conda/bin/python3.8         1659MiB |
|    2   N/A  N/A    396470      C   /opt/conda/bin/python3.8         1659MiB |
|    3   N/A  N/A    396462      C   /opt/conda/bin/python3.8         1659MiB |
|    4   N/A  N/A    396469      C   /opt/conda/bin/python3.8         1659MiB |
|    5   N/A  N/A    396463      C   /opt/conda/bin/python3.8         1659MiB |
|    6   N/A  N/A    396468      C   /opt/conda/bin/python3.8         1659MiB |
|    7   N/A  N/A    396464      C   /opt/conda/bin/python3.8         1659MiB |
+-----------------------------------------------------------------------------+

Without context, all workers loaded model to GPU 0 and it burns out the dedicated memory (14936MiB / 16384MiB)

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    220119      C   /opt/conda/bin/python3.8         1659MiB |
|    0   N/A  N/A    220120      C   /opt/conda/bin/python3.8         1659MiB |
|    0   N/A  N/A    220121      C   /opt/conda/bin/python3.8         1659MiB |
|    0   N/A  N/A    220122      C   /opt/conda/bin/python3.8         1659MiB |
|    0   N/A  N/A    220123      C   /opt/conda/bin/python3.8         1659MiB |
|    0   N/A  N/A    220124      C   /opt/conda/bin/python3.8         1659MiB |
|    0   N/A  N/A    220125      C   /opt/conda/bin/python3.8         1659MiB |
|    0   N/A  N/A    220126      C   /opt/conda/bin/python3.8         1659MiB |
|    0   N/A  N/A    220127      C   /opt/conda/bin/python3.8         1659MiB |
+-----------------------------------------------------------------------------+

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

Tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

ashishgupta023 commented 1 year ago

LGTM

sagemaker-bot commented 1 year ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository