Implement new container environment which could expose additional parameters to help user optimize for throughput and fine tune container performance.
The feature that multiple of tensorflow model servers could handle parallel requests coming from different gunicorn workers could be supported in this release. And throughput will be improved based on different parameters tuned.
Specifically, tunable parameters likeSAGEMAKER_GUNICORN_WORKERS, SAGEMAKER_TFS_INSTANCE_COUNT, SAGEMAKER_TFS_INTER_OP_PARALLELISM, SAGEMAKER_TFS_INTRA_OP_PARALLELISM and etc.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Description of changes:
Implement new container environment which could expose additional parameters to help user optimize for throughput and fine tune container performance.
The feature that multiple of tensorflow model servers could handle parallel requests coming from different gunicorn workers could be supported in this release. And throughput will be improved based on different parameters tuned. Specifically, tunable parameters like
SAGEMAKER_GUNICORN_WORKERS
,SAGEMAKER_TFS_INSTANCE_COUNT
,SAGEMAKER_TFS_INTER_OP_PARALLELISM
,SAGEMAKER_TFS_INTRA_OP_PARALLELISM
and etc.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.