[feature-request] TensorRT support for PyTorch Serve

emilwallner commented 1 year ago

Checklist

[x] I've prepended issue tag with type of change: [feature]
[x] (If applicable) I've documented below the DLC image/dockerfile this relates to
[x] (If applicable) I've documented the tests I've run on the DLC image
[ ] I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
[ ] I've built my own container based off DLC (and I've attached the code used to build my own image)

Concise Description: I'm looking to deploy a TensorRT complied PyTorch model, but the current PyTorch image does not include TensorRT

DLC image/dockerfile: https://github.com/aws/deep-learning-containers/tree/master/pytorch/inference/docker/1.13/py3/cu117

Describe the solution you'd like Add support for TensorRT compiled PyTorch models: https://pytorch.org/TensorRT/getting_started/installation.html#installation

Describe alternatives you've considered I can use the Triton deployment server, but I'd perfer to use torch-serve.

joaopcm1996 commented 1 year ago

You can also extend the PyTorch DLC image, installing TensorRT and its python package in the DockerFile.

geraldstanje commented 6 months ago

hi @emilwallner could you please share how you extended the DLC image with torch-tensorrt? do you also use torch.compile with backend backend="torch_tensorrt" as shown here https://pytorch.org/TensorRT/user_guide/torch_compile.html?

emilwallner commented 5 months ago

@geraldstanje I used a docker container with a pre-installed TensorRT and used the command line interface to compile it, it takes around 30 min to compile. It was too tedious to install manually and link all the required libraries. However, it was a few years ago and your suggested method might be better now.

And then I load it from the compiled version. Make sure to compile it using the same GPU you'll use for inference. Then I end up using this image for deployment: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-22-12.html

I found the Sagemaker environment too complicated to customize and the deployment was too restricted for my use-case. I ended up with an EC2 auto scaling group, and an ECS Service that uses the auto scaling group, and an AWS load balancer.

aws / deep-learning-containers

[feature-request] TensorRT support for PyTorch Serve #2599