aws / deep-learning-containers

AWS Deep Learning Containers are pre-built Docker images that make it easier to run popular deep learning frameworks and tools on AWS.
https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/what-is-dlc.html
Other
1.01k stars 463 forks source link

[feature-request] TensorRT support for PyTorch Serve #2599

Open emilwallner opened 1 year ago

emilwallner commented 1 year ago

Checklist

Concise Description: I'm looking to deploy a TensorRT complied PyTorch model, but the current PyTorch image does not include TensorRT

DLC image/dockerfile: https://github.com/aws/deep-learning-containers/tree/master/pytorch/inference/docker/1.13/py3/cu117

Describe the solution you'd like Add support for TensorRT compiled PyTorch models: https://pytorch.org/TensorRT/getting_started/installation.html#installation

Describe alternatives you've considered I can use the Triton deployment server, but I'd perfer to use torch-serve.

joaopcm1996 commented 1 year ago

You can also extend the PyTorch DLC image, installing TensorRT and its python package in the DockerFile.

geraldstanje commented 6 months ago

hi @emilwallner could you please share how you extended the DLC image with torch-tensorrt? do you also use torch.compile with backend backend="torch_tensorrt" as shown here https://pytorch.org/TensorRT/user_guide/torch_compile.html?

emilwallner commented 5 months ago

@geraldstanje I used a docker container with a pre-installed TensorRT and used the command line interface to compile it, it takes around 30 min to compile. It was too tedious to install manually and link all the required libraries. However, it was a few years ago and your suggested method might be better now.

And then I load it from the compiled version. Make sure to compile it using the same GPU you'll use for inference. Then I end up using this image for deployment: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-22-12.html

I found the Sagemaker environment too complicated to customize and the deployment was too restricted for my use-case. I ended up with an EC2 auto scaling group, and an ECS Service that uses the auto scaling group, and an AWS load balancer.