Description of changes:
This notebook uses SageMaker notebook instance conda_python3 kernel, demonstrates how to use TorchServe to deploy Llama-2-13 on SageMaker inf2.24xlarge. There are multiple advanced features in this example.
Neuronx AOT precompile model
TorchServe microbatching
TorchServe LLM batching streaming respone on SageMaker
SageMaker uncompressed model artifacts
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Issue #, if available:
Description of changes: This notebook uses SageMaker notebook instance conda_python3 kernel, demonstrates how to use TorchServe to deploy Llama-2-13 on SageMaker inf2.24xlarge. There are multiple advanced features in this example.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.