aws / sagemaker-pytorch-inference-toolkit

Toolkit for allowing inference and serving with PyTorch on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at https://github.com/aws/deep-learning-containers.
Apache License 2.0
131 stars 70 forks source link

Use an overriden transform function to support batch inference in pytorch #122

Closed nikhil-sk closed 1 year ago

nikhil-sk commented 2 years ago

Issue #, if available:

121

Description of changes:

  1. This PR fixes the issue where the transform() function drops all but one requests when running prediction.
  2. This PR adds a transform() function to override the transform() function from the sagemaker-inference-toolkit. It loops through the input data and runs _transform_fn() on each input. Then, it appends the response to a list. When all inputs are processed, the list is returned.

Config used:

env_variables_dict = {
    "SAGEMAKER_TS_BATCH_SIZE": "3",
    "SAGEMAKER_TS_MAX_BATCH_DELAY": "10000",
    "SAGEMAKER_TS_MIN_WORKERS": "1",
    "SAGEMAKER_TS_MAX_WORKERS": "1",
}

Requests sent to SM endpoint as:

import multiprocessing

def invoke(endpoint_name):
    return predictor.predict(
        "{Bloomberg has decided to publish a new report on global economic situation.}"
    )

endpoint_name = predictor.endpoint_name
pool = multiprocessing.Pool(3)
results = pool.map(invoke, 5 * [endpoint_name])
pool.close()
pool.join()
print(results)

Logs:

o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:25,915 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1658337985915
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:25,917 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Backend received inference at: 1658337985
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,065 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Processing Data: {'body': bytearray(b'"{Bloomberg has decided to publish a new report on global economic situation.}"')}
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,065 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 149
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,065 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT1
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,066 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT2
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,066 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Got input Data: {Bloomberg has decided to publish a new report on global economic situation.}
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,066 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PRED SequenceClassifierOutput(loss=None, logits=tensor([[ 0.1999, -0.2964]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,066 [INFO ] W-9000-model_1.0 ACCESS_LOG - /172.18.0.1:60322 "POST /invocations HTTP/1.1" 200 161
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,066 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PREDICTION ['Not Accepted']
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,066 [INFO ] W-9000-model_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337917
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,066 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Processing Data: {'body': bytearray(b'"{Bloomberg has decided to publish a new report on global economic situation.}"')}
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,066 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT1
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,067 [INFO ] W-9000-model_1.0 TS_METRICS - QueueTime.ms:10|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337986
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,067 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT2
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,067 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Got input Data: {Bloomberg has decided to publish a new report on global economic situation.}
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,067 [INFO ] W-9000-model_1.0 ACCESS_LOG - /172.18.0.1:60328 "POST /invocations HTTP/1.1" 200 152
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,067 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PRED SequenceClassifierOutput(loss=None, logits=tensor([[ 0.1999, -0.2964]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,067 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PREDICTION ['Not Accepted']
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,067 [INFO ] W-9000-model_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337917
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,067 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Processing Data: {'body': bytearray(b'"{Bloomberg has decided to publish a new report on global economic situation.}"')}
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,067 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT1
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,067 [INFO ] W-9000-model_1.0 TS_METRICS - QueueTime.ms:0|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337986
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,068 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT2
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,068 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Got input Data: {Bloomberg has decided to publish a new report on global economic situation.}
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,068 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PRED SequenceClassifierOutput(loss=None, logits=tensor([[ 0.1999, -0.2964]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,068 [INFO ] W-9000-model_1.0 ACCESS_LOG - /172.18.0.1:60330 "POST /invocations HTTP/1.1" 200 153
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,068 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PREDICTION ['Not Accepted']
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,068 [INFO ] W-9000-model_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337917
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,068 [INFO ] W-9000-model_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:148.06|#ModelName:model,Level:Model|#hostname:6db08d9230d1,requestID:47a54de3-7276-49fe-8ddd-73c8c5f7bbe6,5172d460-3352-4fae-9d64-77535d91c5c7,d1ffb3a9-ffde-4f50-bb02-2a560e5c20f6,timestamp:1658337986
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,069 [INFO ] W-9000-model_1.0 TS_METRICS - QueueTime.ms:0|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337986
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:26,069 [INFO ] W-9000-model_1.0 TS_METRICS - WorkerThreadTime.ms:5|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337986
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,117 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1658337996117
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,119 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Backend received inference at: 1658337996
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,258 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Processing Data: {'body': bytearray(b'"{Bloomberg has decided to publish a new report on global economic situation.}"')}
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,258 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 140
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,258 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT1
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,258 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT2
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,258 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Got input Data: {Bloomberg has decided to publish a new report on global economic situation.}
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,258 [INFO ] W-9000-model_1.0 ACCESS_LOG - /172.18.0.1:60322 "POST /invocations HTTP/1.1" 200 10141
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,258 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PRED SequenceClassifierOutput(loss=None, logits=tensor([[ 0.1999, -0.2964]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,258 [INFO ] W-9000-model_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337917
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,258 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PREDICTION ['Not Accepted']
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,259 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Processing Data: {'body': bytearray(b'"{Bloomberg has decided to publish a new report on global economic situation.}"')}
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,259 [INFO ] W-9000-model_1.0 TS_METRICS - QueueTime.ms:10000|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337996
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,259 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT1
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,259 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT2
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,259 [INFO ] W-9000-model_1.0 ACCESS_LOG - /172.18.0.1:60328 "POST /invocations HTTP/1.1" 200 10142
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,259 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Got input Data: {Bloomberg has decided to publish a new report on global economic situation.}
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,259 [INFO ] W-9000-model_1.0 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337917
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,259 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PRED SequenceClassifierOutput(loss=None, logits=tensor([[ 0.1999, -0.2964]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,259 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PREDICTION ['Not Accepted']
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,259 [INFO ] W-9000-model_1.0 TS_METRICS - QueueTime.ms:10000|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337996
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,260 [INFO ] W-9000-model_1.0-stdout MODEL_METRICS - PredictionTime.Milliseconds:138.38|#ModelName:model,Level:Model|#hostname:6db08d9230d1,requestID:360b3ac1-8d7b-42ab-be3d-e72557233818,391e5d7e-addf-4dff-b81c-a1a17a439c16,timestamp:1658337996
o3eh413g78-algo-1-fi315  | 2022-07-20T17:26:36,260 [INFO ] W-9000-model_1.0 TS_METRICS - WorkerThreadTime.ms:3|#Level:Host|#hostname:6db08d9230d1,timestamp:1658337996
[b'["Not Accepted"]', b'["Not Accepted"]', b'["Not Accepted"]', b'["Not Accepted"]', b'["Not Accepted"]']

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

nikhil-sk commented 1 year ago

Closing this as the equivalent PR was merged upstream for sagemaker-inference-toolkit https://github.com/aws/sagemaker-inference-toolkit/pull/108