Env variable support for batch inference

nikhil-sk commented 2 years ago

Issue #, if available: NA

Description of changes:

Add feature to specify the following properties for a model that help in a batch inference:
```
batchSize
maxBatchDelay
minWorkers
maxWorkers
responseTimeout
```
The above properties for the model have been exposed as the following environment variables in the toolkit:
```
SAGEMAKER_TS_BATCH_SIZE
SAGEMAKER_TS_MAX_BATCH_DELAY
SAGEMAKER_TS_MIN_WORKERS
SAGEMAKER_TS_MAX_WORKERS
SAGEMAKER_TS_RESPONSE_TIMEOUT
```
These properties need to be supplied in a dictionary form to the config option 'env' when configuring a model using the sagemaker python sdk.
Note: These properties only apply in a single model inference on SageMaker. For multi-model endpoint, a user still needs to bake-in the config.properties file, and list the models in the config file.

Logs

When run in SageMaker, the model config is correctly picked up from the environment when specified as follows:

Input

from sagemaker.pytorch.model import PyTorchModel

env_variables_dict = {
    "SAGEMAKER_TS_BATCH_SIZE": "3",
    "SAGEMAKER_TS_MAX_BATCH_DELAY": "100000"
}

pytorch_model = PyTorchModel(
    model_data=model_artifact,
    role=role,
    image_uri=image_uri,
    source_dir="code",
    framework_version='1.9',
    entry_point="inference.py",
    env=env_variables_dict
)

Output:

2n5r6rur8a-algo-1-33bni | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
2n5r6rur8a-algo-1-33bni | ['torchserve', '--start', '--model-store', '/.sagemaker/ts/models', '--ts-config', '/etc/sagemaker-ts.properties', '--log-config', '/sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/etc/log4j.properties', '--models', 'model.mar']
2n5r6rur8a-algo-1-33bni | 2021-09-27 19:06:42,737 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2n5r6rur8a-algo-1-33bni | 2021-09-27 19:06:42,927 [INFO ] main org.pytorch.serve.ModelServer - 
2n5r6rur8a-algo-1-33bni | Torchserve version: 0.4.2
2n5r6rur8a-algo-1-33bni | TS Home: /usr/local/lib/python3.6/dist-packages
2n5r6rur8a-algo-1-33bni | Current directory: /
2n5r6rur8a-algo-1-33bni | Temp directory: /tmp
2n5r6rur8a-algo-1-33bni | Number of GPUs: 0
2n5r6rur8a-algo-1-33bni | Number of CPUs: 32
2n5r6rur8a-algo-1-33bni | Max heap size: 30688 M
2n5r6rur8a-algo-1-33bni | Python executable: /usr/bin/python3
2n5r6rur8a-algo-1-33bni | Config file: /etc/sagemaker-ts.properties
2n5r6rur8a-algo-1-33bni | Inference address: http://0.0.0.0:8080
2n5r6rur8a-algo-1-33bni | Management address: http://0.0.0.0:8080
2n5r6rur8a-algo-1-33bni | Metrics address: http://127.0.0.1:8082
2n5r6rur8a-algo-1-33bni | Model Store: /.sagemaker/ts/models
2n5r6rur8a-algo-1-33bni | Initial Models: model.mar
2n5r6rur8a-algo-1-33bni | Log dir: /logs
2n5r6rur8a-algo-1-33bni | Metrics dir: /logs
2n5r6rur8a-algo-1-33bni | Netty threads: 0
2n5r6rur8a-algo-1-33bni | Netty client threads: 0
2n5r6rur8a-algo-1-33bni | Default workers per model: 32
2n5r6rur8a-algo-1-33bni | Blacklist Regex: N/A
2n5r6rur8a-algo-1-33bni | Maximum Response Size: 6553500
2n5r6rur8a-algo-1-33bni | Maximum Request Size: 6553500
2n5r6rur8a-algo-1-33bni | Prefer direct buffer: false
2n5r6rur8a-algo-1-33bni | Allowed Urls: [file://.*|http(s)?://.*]
2n5r6rur8a-algo-1-33bni | Custom python dependency for model allowed: false
2n5r6rur8a-algo-1-33bni | Metrics report format: prometheus
2n5r6rur8a-algo-1-33bni | Enable metrics API: true
2n5r6rur8a-algo-1-33bni | Workflow Store: /.sagemaker/ts/models
2n5r6rur8a-algo-1-33bni | Model config: {"model": {"1.0": {"defaultVersion": true, "marName": "model.mar", "minWorkers": 1, "maxWorkers": 4, "batchSize": 3, "maxBatchDelay": 100000, "responseTimeout": 120}}}

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.