Closed ifelsefi closed 9 months ago
If I run with --model TabbyML/CodeLlama-7B
the container does start:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
44420055991d private.docker.registry.example.com:latest serve --model Tab... About a minute ago Up About a minute ago 0.0.0.0:31337->31337/tcp wizardly_lumiere
I see listening on port 31337
:
conmon 13625 root 5u IPv4 77701 0t0 TCP *:31337 (LISTEN)
Though no swagger:
[root@gpuserver~]# curl http://localhost:31337/swagger
curl: (7) Failed to connect to localhost port 31337: Connection refused
I am unsure how to get logs from the container. If I try to kill the container using kill -9 pid
it hangs so it seems to be in a bad state.
Waiting a little longer the container fails as it tries to download the model instead of use the copy I added within container at /data/models/TabbyML/CodeLlama-7B
.
I am running this on air-gapped server so need /data/models/TabbyML/CodeLlama-7B
to work.
WARN[0000] Ignoring global metacopy option, not supported with booted kernel
thread 'main' panicked at /root/workspace/crates/tabby-common/src/registry.rs:52:21:
Failed to fetch model organization <TabbyML>: error sending request for url (https://raw.githubusercontent.com/TabbyML/registry-tabby/main/models.json): error trying to connect: tcp connect error: Connection timed out (os error 110)
Caused by:
0: error trying to connect: tcp connect error: Connection timed out (os error 110)
1: tcp connect error: Connection timed out (os error 110)
2: Connection timed out (os error 110)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Interesting. For some reason model works when loading it with bind mount from external NFS source, at /software/tabbyml/CodeLlama-7B
, but not within the container at /data
.
[root@gpuserver~]# ls -ltrSh /software/tabbyml/CodeLlama-7B/
total 51G
-rw-r--r-- 1 root root 106 Dec 13 13:49 tabby.json
-rw-r--r-- 1 root root 116 Dec 13 13:49 generation_config.json
-rw-r--r-- 1 root root 411 Dec 13 13:49 special_tokens_map.json
-rw-r--r-- 1 root root 636 Dec 13 13:49 config.json
-rw-r--r-- 1 root root 745 Dec 13 13:49 tokenizer_config.json
drwxr-xr-x 2 root root 4.0K Dec 13 14:00 ggml
drwxr-xr-x 2 root root 4.0K Dec 13 14:00 ctranslate2
-rw-r--r-- 1 root root 4.7K Dec 13 13:49 USE_POLICY.md
-rw-r--r-- 1 root root 6.5K Dec 13 13:49 README.md
-rw-r--r-- 1 root root 6.9K Dec 13 13:49 LICENSE
-rw-r--r-- 1 root root 24K Dec 13 13:49 pytorch_model.bin.index.json
-rw-r--r-- 1 root root 25K Dec 13 13:49 model.safetensors.index.json
-rw-r--r-- 1 root root 489K Dec 13 14:01 tokenizer.model
-rw-r--r-- 1 root root 1.8M Dec 13 13:49 tokenizer.json
-rw-r--r-- 1 root root 2.6G Dec 13 14:04 pytorch_model-00006-of-00006.bin
-rw-r--r-- 1 root root 3.3G Dec 13 14:01 model-00002-of-00002.safetensors
-rw-r--r-- 1 root root 3.3G Dec 13 14:03 pytorch_model-00002-of-00002.bin
-rw-r--r-- 1 root root 4.6G Dec 13 14:04 pytorch_model-00001-of-00006.bin
-rw-r--r-- 1 root root 4.6G Dec 13 13:57 pytorch_model-00005-of-00006.bin
-rw-r--r-- 1 root root 4.6G Dec 13 14:02 pytorch_model-00004-of-00006.bin
-rw-r--r-- 1 root root 4.6G Dec 13 13:57 pytorch_model-00003-of-00006.bin
-rw-r--r-- 1 root root 4.6G Dec 13 14:02 pytorch_model-00002-of-00006.bin
-rw-r--r-- 1 root root 9.3G Dec 13 13:57 model-00001-of-00002.safetensors
-rw-r--r-- 1 root root 9.3G Dec 13 14:06 pytorch_model-00001-of-00002.bin
[root@gpuserver~]# podman run -it --user 1000:1000 --security-opt=no-new-privileges --cap-drop=ALL --gpus all -p 8080:8080 -v /software/tabbyml/CodeLlama-7B:/data/models/TabbyML/CodeLlama-7B private.docker.registry.example.com/tabbyml_fork:latest serve --model /data/models/TabbyML/CodeLlama-7B --device cuda
WARN[0000] Ignoring global metacopy option, not supported with booted kernel
2023-12-13T20:08:23.591398Z INFO tabby::services::model: crates/tabby/src/services/model.rs:80: Loading model from local path /data/models/TabbyML/CodeLlama-7B
2023-12-13T20:08:23.591441Z INFO tabby::serve: crates/tabby/src/serve.rs:114: Starting server, this might take a few minutes...
RUN git clone -c http.sslVerify=false --progress -v https://huggingface.co/TabbyML/CodeLlama-7B
A random guess is that this line doesn't actually clone the model because git lfs
is not bundled by default.
I'll suggest use tabby download
to download the model during the container build, and copy it to /data
at your first run
I added git-lfs in the Dockerfile but will try that process next time when I build for kubernetes. Thanks.
I am unable to run TabbyML with previously downloaded model. Following steps in other reported issues.
The error:
I have
Tesla V100-PCIE-32GB
onRocky Linux release 8.7
withpodman 4.2
and Cuda 12.1.0. GPU config steps were followed.In my
Dockerfile
I did:I see the model was downloaded properly during
docker build
process:Can you please help?
Thanks