tabbyml unable to run with downloaded model

ifelsefi commented 9 months ago

I am unable to run TabbyML with previously downloaded model. Following steps in other reported issues.

The error:

[root@gpuserver ~]# podman run -it --user 1000:1000 --security-opt=no-new-privileges --cap-drop=ALL --gpus all -p 31337:31337 private.docker.registry.example.com:tabbyml_fork:latest serve --model /data/models/TabbyML/CodeLlama-7B --device cuda
WARN[0000] Ignoring global metacopy option, not supported with booted kernel
thread 'main' panicked at crates/tabby-common/src/registry.rs:87:9:
Invalid model id /data/models/TabbyML/CodeLlama-7B
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I have Tesla V100-PCIE-32GB on Rocky Linux release 8.7 with podman 4.2 and Cuda 12.1.0. GPU config steps were followed.

In my Dockerfile I did:

# Download models
WORKDIR /data
RUN ls -ltr .
RUN mkdir -pv /data/models/TabbyML
RUN ls -ltr .
WORKDIR /data/models/TabbyML
RUN ls -ltrSh .
RUN pwd
RUN git clone -c http.sslVerify=false --progress -v https://huggingface.co/TabbyML/CodeLlama-7B
WORKDIR /data/models/TabbyML/CodeLlama-7B
RUN ls -ltrSh .
RUN pwd

I see the model was downloaded properly during docker build process:


#28 [build 19/28] WORKDIR /data/models/TabbyML/CodeLlama-7B
#28 sha256:e63b49a7e9bc8567d1d2058197145131e6add0c8726244f8c3a351e6f2246aad
#28 DONE 0.0s

#29 [build 20/28] RUN ls -ltrSh .
#29 sha256:6ca0e7b02634113195193042bc0f229802fd2f6375ea1354ecf4145074507fc3
#29 0.313 total 51G
#29 0.313 drwxr-xr-x 2 root root   62 Dec 13 18:52 ggml
#29 0.313 drwxr-xr-x 2 root root   65 Dec 13 18:53 ctranslate2
#29 0.313 -rw-r--r-- 1 root root  106 Dec 13 18:43 tabby.json
#29 0.313 -rw-r--r-- 1 root root  116 Dec 13 18:43 generation_config.json
#29 0.313 -rw-r--r-- 1 root root  411 Dec 13 18:43 special_tokens_map.json
#29 0.313 -rw-r--r-- 1 root root  636 Dec 13 18:43 config.json
#29 0.313 -rw-r--r-- 1 root root  745 Dec 13 18:43 tokenizer_config.json
#29 0.313 -rw-r--r-- 1 root root 4.7K Dec 13 18:43 USE_POLICY.md
#29 0.313 -rw-r--r-- 1 root root 6.5K Dec 13 18:43 README.md
#29 0.313 -rw-r--r-- 1 root root 6.9K Dec 13 18:43 LICENSE
#29 0.313 -rw-r--r-- 1 root root  24K Dec 13 18:43 pytorch_model.bin.index.json
#29 0.313 -rw-r--r-- 1 root root  25K Dec 13 18:43 model.safetensors.index.json
#29 0.313 -rw-r--r-- 1 root root 489K Dec 13 18:58 tokenizer.model
#29 0.313 -rw-r--r-- 1 root root 1.8M Dec 13 18:43 tokenizer.json
#29 0.313 -rw-r--r-- 1 root root 2.6G Dec 13 19:03 pytorch_model-00006-of-00006.bin
#29 0.313 -rw-r--r-- 1 root root 3.3G Dec 13 18:55 model-00002-of-00002.safetensors
#29 0.313 -rw-r--r-- 1 root root 3.3G Dec 13 19:03 pytorch_model-00002-of-00002.bin
#29 0.313 -rw-r--r-- 1 root root 4.6G Dec 13 19:05 pytorch_model-00001-of-00006.bin
#29 0.313 -rw-r--r-- 1 root root 4.6G Dec 13 18:58 pytorch_model-00005-of-00006.bin
#29 0.313 -rw-r--r-- 1 root root 4.6G Dec 13 19:03 pytorch_model-00004-of-00006.bin
#29 0.313 -rw-r--r-- 1 root root 4.6G Dec 13 18:58 pytorch_model-00003-of-00006.bin
#29 0.313 -rw-r--r-- 1 root root 4.6G Dec 13 19:03 pytorch_model-00002-of-00006.bin
#29 0.313 -rw-r--r-- 1 root root 9.3G Dec 13 18:50 model-00001-of-00002.safetensors
#29 0.313 -rw-r--r-- 1 root root 9.3G Dec 13 19:14 pytorch_model-00001-of-00002.bin
#29 DONE 0.4s

Can you please help?

Thanks

ifelsefi commented 9 months ago

If I run with --model TabbyML/CodeLlama-7B the container does start:

CONTAINER ID  IMAGE                                              COMMAND               CREATED             STATUS                 PORTS                     NAMES
44420055991d  private.docker.registry.example.com:latest  serve --model Tab...  About a minute ago  Up About a minute ago  0.0.0.0:31337->31337/tcp  wizardly_lumiere

I see listening on port 31337:

conmon    13625           root    5u  IPv4  77701      0t0  TCP *:31337 (LISTEN)

Though no swagger:

[root@gpuserver~]# curl http://localhost:31337/swagger
curl: (7) Failed to connect to localhost port 31337: Connection refused

I am unsure how to get logs from the container. If I try to kill the container using kill -9 pid it hangs so it seems to be in a bad state.

ifelsefi commented 9 months ago

Waiting a little longer the container fails as it tries to download the model instead of use the copy I added within container at /data/models/TabbyML/CodeLlama-7B.

I am running this on air-gapped server so need /data/models/TabbyML/CodeLlama-7B to work.

WARN[0000] Ignoring global metacopy option, not supported with booted kernel
thread 'main' panicked at /root/workspace/crates/tabby-common/src/registry.rs:52:21:
Failed to fetch model organization <TabbyML>: error sending request for url (https://raw.githubusercontent.com/TabbyML/registry-tabby/main/models.json): error trying to connect: tcp connect error: Connection timed out (os error 110)

Caused by:
    0: error trying to connect: tcp connect error: Connection timed out (os error 110)
    1: tcp connect error: Connection timed out (os error 110)
    2: Connection timed out (os error 110)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

ifelsefi commented 9 months ago

Interesting. For some reason model works when loading it with bind mount from external NFS source, at /software/tabbyml/CodeLlama-7B, but not within the container at /data.

[root@gpuserver~]# ls -ltrSh /software/tabbyml/CodeLlama-7B/
total 51G
-rw-r--r-- 1 root root  106 Dec 13 13:49 tabby.json
-rw-r--r-- 1 root root  116 Dec 13 13:49 generation_config.json
-rw-r--r-- 1 root root  411 Dec 13 13:49 special_tokens_map.json
-rw-r--r-- 1 root root  636 Dec 13 13:49 config.json
-rw-r--r-- 1 root root  745 Dec 13 13:49 tokenizer_config.json
drwxr-xr-x 2 root root 4.0K Dec 13 14:00 ggml
drwxr-xr-x 2 root root 4.0K Dec 13 14:00 ctranslate2
-rw-r--r-- 1 root root 4.7K Dec 13 13:49 USE_POLICY.md
-rw-r--r-- 1 root root 6.5K Dec 13 13:49 README.md
-rw-r--r-- 1 root root 6.9K Dec 13 13:49 LICENSE
-rw-r--r-- 1 root root  24K Dec 13 13:49 pytorch_model.bin.index.json
-rw-r--r-- 1 root root  25K Dec 13 13:49 model.safetensors.index.json
-rw-r--r-- 1 root root 489K Dec 13 14:01 tokenizer.model
-rw-r--r-- 1 root root 1.8M Dec 13 13:49 tokenizer.json
-rw-r--r-- 1 root root 2.6G Dec 13 14:04 pytorch_model-00006-of-00006.bin
-rw-r--r-- 1 root root 3.3G Dec 13 14:01 model-00002-of-00002.safetensors
-rw-r--r-- 1 root root 3.3G Dec 13 14:03 pytorch_model-00002-of-00002.bin
-rw-r--r-- 1 root root 4.6G Dec 13 14:04 pytorch_model-00001-of-00006.bin
-rw-r--r-- 1 root root 4.6G Dec 13 13:57 pytorch_model-00005-of-00006.bin
-rw-r--r-- 1 root root 4.6G Dec 13 14:02 pytorch_model-00004-of-00006.bin
-rw-r--r-- 1 root root 4.6G Dec 13 13:57 pytorch_model-00003-of-00006.bin
-rw-r--r-- 1 root root 4.6G Dec 13 14:02 pytorch_model-00002-of-00006.bin
-rw-r--r-- 1 root root 9.3G Dec 13 13:57 model-00001-of-00002.safetensors
-rw-r--r-- 1 root root 9.3G Dec 13 14:06 pytorch_model-00001-of-00002.bin

[root@gpuserver~]# podman run -it --user 1000:1000 --security-opt=no-new-privileges --cap-drop=ALL --gpus all -p 8080:8080 -v /software/tabbyml/CodeLlama-7B:/data/models/TabbyML/CodeLlama-7B private.docker.registry.example.com/tabbyml_fork:latest serve --model  /data/models/TabbyML/CodeLlama-7B --device cuda
WARN[0000] Ignoring global metacopy option, not supported with booted kernel
2023-12-13T20:08:23.591398Z  INFO tabby::services::model: crates/tabby/src/services/model.rs:80: Loading model from local path /data/models/TabbyML/CodeLlama-7B
2023-12-13T20:08:23.591441Z  INFO tabby::serve: crates/tabby/src/serve.rs:114: Starting server, this might take a few minutes...

wsxiaoys commented 9 months ago

RUN git clone -c http.sslVerify=false --progress -v https://huggingface.co/TabbyML/CodeLlama-7B

A random guess is that this line doesn't actually clone the model because git lfs is not bundled by default.

I'll suggest use tabby download to download the model during the container build, and copy it to /data at your first run

ifelsefi commented 9 months ago

I added git-lfs in the Dockerfile but will try that process next time when I build for kubernetes. Thanks.

TabbyML / tabby

tabbyml unable to run with downloaded model #1033