awslabs / data-on-eks

DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS
https://awslabs.github.io/data-on-eks/
Apache License 2.0
658 stars 223 forks source link

nvidia-triton-server-triton-inference-server pod is crashing #605

Closed purnasanyal closed 3 months ago

purnasanyal commented 3 months ago

Description

Hi, I am creating a demo for “Deploying Multiple Large Language Models with NVIDIA Triton Server and vLLM” from my Isengard account using Cloud9. However, nvidia-triton-server-triton-inference-server-54546fdb86-wh7tb pod is crashing. Attached is the pod log.

I think – I have access to Huggingface for Llama and Mistral models.

Terraform v1.9.3

pod.log image

askulkarni2 commented 3 months ago

As the error in the log says, you have to accept the terms and conditions on hugging face for the models.

I0806 21:52:45.863589 1 pb_stub.cc:366] "Failed to initialize Python stub: OSError: You are trying to access a gated repo.\nMake sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct.\n403 Client Error. (Request ID: Root=1-66b29b2d-360bea0d76518b604809e08c;c38d10d6-674e-466b-9a56-a7b3f7c07a4e)\n\nCannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.\nAccess to model meta-llama/Meta-Llama-3-8B-Instruct is restricted and you are not in the authorized list. Visit https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct to ask for access.