Open xieshenzh opened 3 months ago
@supertetelman can you take a look into the issue.
@xieshenzh thanks for reporting this, I'm trying to do the exact same thing. Followed your procedure and got the same results with the nvidia-nim-llama-3.1-8b-instruct-1.1.2 image
My overall thought is to pre-cache new NIM models with modelcars on each of my OpenShift nodes using image puller and let KServe do its thing for faster scale up when necessary.
I tried to deploy
llama-3.1-8b-instruct:1.1.1
with Kserve and modelcar on Openshift AI.What I have done?
podman run --rm -e NGC_API_KEY=<API_KEY> -v /models:/opt/nim/.cache nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.1 create-model-store --profile <PROFILE> --model-store /opt/nim/.cache
.ServingRuntime
CR and set theNIM_MODEL_NAME
environment variable to /mnt/models/ which is the path where model files mounted from the modelcar container.InferenceService
CR and set thestorageUri
to use the modelcar image created in 2.(Scripts executed in the terminal of the NIM container)
$ ls -al /mnt/models lrwxrwxrwx. 1 1001090000 1001090000 20 Aug 7 20:34 /mnt/models -> /proc/76/root/models $ ls -al /proc/76/root/models/trtllm_engine/rank0.engine -rw-r--r--. 1 root root 16218123260 Jul 30 18:18 /proc/76/root/models/trtllm_engine/rank0.engine