nvidia-triton-server-triton-inference-server pod is crashing

awslabs / data-on-eks

DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS

Apache License 2.0

658 stars 223 forks source link

Description

Hi, I am creating a demo for “Deploying Multiple Large Language Models with NVIDIA Triton Server and vLLM” from my Isengard account using Cloud9. However, nvidia-triton-server-triton-inference-server-54546fdb86-wh7tb pod is crashing. Attached is the pod log.

I think – I have access to Huggingface for Llama and Mistral models.

Terraform v1.9.3

As the error in the log says, you have to accept the terms and conditions on hugging face for the models.

I0806 21:52:45.863589 1 pb_stub.cc:366] "Failed to initialize Python stub: OSError: You are trying to access a gated repo.\nMake sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct.\n403 Client Error. (Request ID: Root=1-66b29b2d-360bea0d76518b604809e08c;c38d10d6-674e-466b-9a56-a7b3f7c07a4e)\n\nCannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.\nAccess to model meta-llama/Meta-Llama-3-8B-Instruct is restricted and you are not in the authorized list. Visit https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct to ask for access.

awslabs / data-on-eks

nvidia-triton-server-triton-inference-server pod is crashing #605

Description