[question] Where is the dockerfile for Sagemaker Triton Inference Server?

david-waterworth commented 1 year ago

I cannot find the source for the triton container. I'm trying to diagnose an issue that only occurs when I run 2 python models (with a conda-pack's env) and 1 onyx model in the sagemaker version of triton (it only works with 1 python model and 1 onyx model). The exact same model-repository works with the latest (slightly newer) official triton image.

Is it in this repo or elsewhere?

I'd also like to build and use the container in order to use it to build python stubs - there's a similar issue where stubs that are built using the nvidia toolchain won't load in the sagemaker image, I suspect because the container uses AWS Linux rather than Ubuntu?

nikhil-sk commented 1 year ago

@david-waterworth Thank you for creating the issue.

The SageMaker image is essentially the same image as NGC container, just built with --endpoint=sagemaker, and started with --allow-sagemaker=true. It enables the SM API workflow written in this file - https://github.com/triton-inference-server/server/blob/main/src/sagemaker_server.cc.

Can you please list the reproduction steps here? Including, the model (or a toy model), config.pbtxt, and assuming you're running on SageMaker itself, how do you create the endpoint, the payload and invoke it?

david-waterworth commented 1 year ago

Thanks for the information @nskool

My first issue was that I wasn't aware that there are two notebook environments available in sagemaker ("Sagemaker Studio" and "Notebook Instances"). I've been exclusively using Sagemaker Studio so I've been very confused when the examples say "tested with the conda_python3 kernel or use docker etc. Now that I've discovered and created a Notebook Instance, I've resolved that issue.

The second problem was it wasn't very clear to me which version of Python Triton used, I initially thought that the AWS Sagemaker and Official Triton images used different versions (3.8 vs 3.10), but I eventually figured out that NVidia changed from Ubuntu 20.04 and Python 3.8 to Ubuntu 22.04 and Python 3.10 between container version 23.2 and 23.7

So now I really don't need to build a stub, I just needed to create a python 3.8 environment. And when I update to the latest version of Triton (23.06 seems to be the latest on Sagemaker) I'll need to switch to Python 3.10

Finally I had to modify the model slightly - I've not gone back and verified that this is still an issue but it appeared to me that 23.02 seemed to import the model.py twice, the first time into the default env and the second into the proper env. I followed the sample code and moved any imports into the initialize method and it works.

I don't know if this is a bug that's been fixed since 23.02, it seemed odd that the sample would import transformers inside the intialise method unless it was a workaround. My code is based on an NVidia example that runs on a more recent release.

#from transformers import AutoTokenizer, PreTrainedTokenizer, TensorType

class TritonPythonModel:
    #tokenizer: PreTrainedTokenizer

    def initialize(self, args: Dict[str, str]) -> None:
        """
        Initialize the tokenization process
        :param args: arguments from Triton config file
        """
        # more variables in https://github.com/triton-inference-server/python_backend/blob/main/src/python.cc
        path: str = os.path.join(args["model_repository"], args["model_version"])

        from transformers import AutoTokenizer # <- Moved import here
        self.tokenizer = AutoTokenizer.from_pretrained(path)

nikhil-sk commented 1 year ago

@david-waterworth Thank you for the update, could you please let us know if you're still running into any issues regarding import, and use of python model?

As for the 'twice' import, I'm unsure why that seems to have happened. It would be great if you could share a step by step repro of this. I can try it out and get back. If later versions of the container have fixed this issue, then I'd chalk it to a one-off trial issue in 23.02.

david-waterworth commented 1 year ago

Thanks @nskool no I’ll close this now, the issues I observed seemed to have gone away now that I have a robust method of generating the Python environment.

ohadkatz commented 1 year ago

Closing this issue in accordance to the communication above. Please reopen this issue if you require additional help or guidance.

geraldstanje commented 4 months ago

@ohadkatz @nskool where can i find how to build triton inference server trt-llm 24.06 myself and run on sagemaker?

aws / deep-learning-containers

[question] Where is the dockerfile for Sagemaker Triton Inference Server? #3267