99

Description of changes: Apply the fixing in pytorch inference toolkit

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

davidthomas426 commented 1 year ago

I'll just note that this issue along with workarounds and fixes have shown up across different inference toolkits.

Here is a list of links showing that it's been a recurring problem.

Also, this fix may create other problems, as turning off container support means that the JVM does not respect Docker container memory limits.

We should make sure to address this uniformly across the inference toolkits and deep-learning-containers, while allowing users to easily customize without needing onerous workarounds such as using derived deep-learning-container images or even forking toolkit.

Links:

SageMaker docs troubleshooting:
- https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model-troubleshoot.html
Other issues, across inference toolkits:
- Base Inference Toolkit: https://github.com/aws/sagemaker-inference-toolkit/issues/82
- HuggingFace toolkit: https://github.com/aws/sagemaker-huggingface-inference-toolkit/issues/3
Other pull request from this repo:
- Turn off UseContainerSupport: https://github.com/aws/sagemaker-pytorch-inference-toolkit/pull/98
- Same?: https://github.com/aws/sagemaker-pytorch-inference-toolkit/pull/100
Other pull requests, other toolkits:
- https://github.com/aws/sagemaker-inference-toolkit/pull/83
- Allow user to change vmargs: https://github.com/aws/sagemaker-inference-toolkit/pull/118
- Note previous hack, requiring derived container (but not forked toolkit), using DockerFile change:
  - https://github.com/aws/sagemaker-inference-toolkit/issues/82#issuecomment-802234692
  - https://github.com/aws/sagemaker-pytorch-inference-toolkit/issues/99#issuecomment-968695239
- https://github.com/aws/deep-learning-containers/pull/1077
- @vdantu mentioned this in PR code review, but it was ignored twice
DLC repo still has vmargs with +UseContainerSupport in three different places:

I still don't think this is an exhaustive list.

chen3933 commented 1 year ago

The PR will be updated to allow customization of vmargs Example : https://github.com/aws/sagemaker-inference-toolkit/pull/118/commits/c01dde749687f86e7891ec403eed3f98d4fcfb50

rohithkrn commented 1 year ago

python 3.7 tests failing at coverage report step. Failing to invoke coverage command. Works fine for python3.6 ERROR: InvocationError for command /codebuild/output/src309395522/src/github.com/aws/sagemaker-pytorch-inference-toolkit/.tox/py37/bin/coverage report --fail-under=90 --include '*sagemaker_pytorch_serving_container*' (exited with code 1)

aws / sagemaker-pytorch-inference-toolkit

add vmargs=-XX:-UseContainerSupport in config #136

99