Open geraldstanje opened 6 months ago
Hi @geraldstanje, we have recently updated torchServe version to 0.11.0. Please pull the latest images to use them.
For tensor-rt, we'll require a repro steps to do so. However, we suggest taking a look at DJL TensorRT containers if you would be interested in that.
For extending DLCs, you can do so as you outlined. Model artifacts are copied into the container at runtime from the Python SDK (which I am assuming is what you're using) through a docker run
For the image tag, the two images you outlined are the same image even though the tags are different. I want to note though that FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:2.1-gpu-py310
is us-west-2
and FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker
is us-east-1
If you want to look at all available tags, you can find them in the GitHub release tags and available_images.md.
we have recently updated torchServe version to 0.11.0. Please pull the latest images to use them.
whats the name of that pytorch image? e.g. 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.2.0-gpu-py310-cu118-ubuntu20.04-sagemaker
is that what you refer to? https://github.com/aws/deep-learning-containers/tree/master/pytorch/inference/docker/2.2/py3
For tensor-rt, we'll require a repro steps to do so. However, we suggest taking a look at DJL TensorRT containers if you would be interested in that.
why switch to a different image? torch-tensorrt and tensorrt can be used with torchServe...
Any supported PyTorch (PT 1.13, 2.1, 2.2) inference image would work. They all have torchserve 0.11.0. Generally, you can pull images with the following tags:
2.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker
2.1-gpu-py310
2.1.0-gpu-py310
These tags would pull our latest release which will be moved to the latest image every time we release a patch.
However, you may see some tags such as
2.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker-v1.8
2.1-gpu-py310-cu118-ubuntu20.04-sagemaker-v1
2.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker-v1.8-2024-05-22-19-30-53
These tags represent the specific patch releases. so using these tags would pull in specific image that was released at certain date.
can you also confirm that cuda driver matches for pytorch 2.2 and is >= 11.8 - which is also required by pytorch/TensorRT: https://github.com/pytorch/TensorRT/releases
Yes, our gpu inference image is using cuda 11.8.
i can extect the image and install torch-tensorrt 2.2 with this new image?
We don't expect any installation error with tensor-rt but you're welcomed to outline repro steps if you encounter issues and we'll be happy to reproduce and assist.
why switch to a different image? torch-tensorrt and tensorrt can be used with torchServe...
DJL containers offer tensorrt out of the box while our regular DLCs do not. DJL containers can also be used similarly to extend your own custom containers. For more information about DJL containers
@sirutBuasai do you also going to release a new pytorch-inference image with cuda 12.x?
Not for PyTorch 2.1 and 2.2 Inference.
However, we are working on PyTorch 2.3 Inference with CUDA 12.1. Feel free to track this PR for when it will be released.
@sirutBuasai any timeline when PyTorch 2.3 Inference with CUDA 12.1
will be available?
do you also update the triton inference image for cuda 12.x soon?
We are aiming for 6/7 for PyTorch 2.3 Inference with CUDA 12.1.
Which triton image are you referring to?
@sirutBuasai i mean NVIDIA Triton Inference Server: https://github.com/aws/deep-learning-containers/blob/master/available_images.md#nvidia-triton-inference-containers-sm-support-only - can someone build the Triton Inference Server Release 24.05?
dont see the nvidia-triton-inference-containers image in this github repo... can you send me the link?
cc @nskool
@nskool Could you assist with triton image questions?
@sirutBuasai - if you go the following link it says:
Dependencies
These are the following dependencies used to verify the testcases.
Torch-TensorRT can work with other versions, but the tests are not guaranteed to pass.
Bazel 5.2.0
Libtorch 2.4.0.dev (latest nightly) (built with CUDA 12.1)
CUDA 12.1
TensorRT 10.0.1.6
https://github.com/pytorch/TensorRT
i use torch-tensorrt 2.2.0 with dlc 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:2.2.0-gpu-py310-cu118-ubuntu20.04-sagemaker-v1.10 and get error:
predict_fn error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr
but when i run it on ec2 with cuda - it works fine - it seems i cannot use cuda 11 and require cuda 12.x for torch-tensorrt 2.2.0...
regarding NVIDIA Triton Inference Server
NOTE: CUDA Forward Compatibility mode ENABLED.
Using CUDA 12.3 driver version 545.23.08 with kernel driver version 470.182.03.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
cc @nskool @sirutBuasai
For tensorrt installation error, could you provide the following:
Checklist
Error Message:
entire log: Logs:
code/requirements.txt:
DLC image/dockerfile: 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:2.1-gpu-py310
Current behavior: error during installing torch-tensorrt
Expected behavior: no error
Additional context:
Can i extend the deep learning image for sagemaker as follows, push this image to aws ecr and use that image to deploy my sagemaker inference endpoint? how does the model artifact (code/inference.py code/requirements.txt model etc.) get copied into the docker container?
i see there are 2 images - can i use both for sagemaker - or only the second one?
vs.
also the torch-tensorrt 2.2.0 whl file is available here: https://pypi.org/project/torch-tensorrt/2.2.0/ - why it cant find it?
cc @tejaschumbalkar @joaopcm1996
also, torchServe is already at version 0.10 - how can i use that version with 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:2.1-gpu-py310 or 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker? cc @sirutBuasai