Offline install Model is always loading

lcwxz1989 commented 3 months ago

Describe the problem: metal: system OS: ubuntu 20.04 gpu: nvidia Geforce RTX 2060 cuda: 12.3 cuda driver version: 545.23.08 env: offline, no internet docker env: put models into /models

To Reproduce: cpu percent: 100%

wwayne commented 3 months ago

Hi @lcwxz1989 I tried it on my computer and it is working, let me share my steps for your reference

I donwload gguf from https://huggingface.co/TheBloke/deepseek-coder-1.3b-base-GGUF/resolve/main/deepseek-coder-1.3b-base.Q8_0.gguf
Go to .tabby/models/TabbyML, I create a new folder DeepseekCoder-1.3B

inside TabbyML/DeepseekCoder-1.3B, I create a tabby.json, the content is

{"name":"DeepseekCoder-1.3B","prompt_template":"<｜fim▁begin｜>{prefix}<｜fim▁hole｜>{suffix}<｜fim▁end｜>","urls":["https://huggingface.co/TheBloke/deepseek-coder-1.3b-base-GGUF/resolve/main/deepseek-coder-1.3b-base.Q8_0.gguf"],"sha256":"9fcdcb283ef5b1d80ec7365b307c1ceab0c0f8ea079b49969f7febc06a11bccd"}

Inside TabbyML/DeepseekCoder-1.3B, create a new folder ggml and put the gguf file into it, rename it to model.gguf
In the offline environment, I run tabby serve --model DeepseekCoder-1.3B --device cuda, and after a while, I can see it is running

lcwxz1989 commented 3 months ago

Hi @lcwxz1989 I tried it on my computer and it is working, let me share my steps for your reference

I donwload gguf from https://huggingface.co/TheBloke/deepseek-coder-1.3b-base-GGUF/resolve/main/deepseek-coder-1.3b-base.Q8_0.gguf

Go to .tabby/models/TabbyML, I create a new folder DeepseekCoder-1.3B

inside TabbyML/DeepseekCoder-1.3B, I create a tabby.json, the content is
{"name":"DeepseekCoder-1.3B","prompt_template":"<｜fim▁begin｜>{prefix}<｜fim▁hole｜>{suffix}<｜fim▁end｜>","urls":["https://huggingface.co/TheBloke/deepseek-coder-1.3b-base-GGUF/resolve/main/deepseek-coder-1.3b-base.Q8_0.gguf"],"sha256":"9fcdcb283ef5b1d80ec7365b307c1ceab0c0f8ea079b49969f7febc06a11bccd"}
Inside TabbyML/DeepseekCoder-1.3B, create a new folder ggml and put the gguf file into it, rename it to model.gguf

In the offline environment, I run tabby serve --model DeepseekCoder-1.3B --device cuda, and after a while, I can see it is running

I think I have already did all of this things, I run the tabby is a docker image because of offline env, how do you run the tabby.I think the docker maybe lost some ENV or something my docker is tabbyml/tabby:0.12.1-rc.0, sudo docker run --runtime=nvidia -it --gpus all -p 8888:8080 -v $HOME/.tabby:/data tabbyml/tabby:0.12.1-rc.0 serve --model StarCoder-1B --device cuda the $HOME/.tabby dir tree is the picture above

wwayne commented 3 months ago

I used docker as well, the command I used is:

docker run -it --gpus all \
  -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby serve --model DeepseekCoder-1.3B --device cuda

I'm running from windows WSL2 (ubuntu20)

lcwxz1989 commented 3 months ago

I used docker as well, the command I used is:
docker run -it --gpus all \
  -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby serve --model DeepseekCoder-1.3B --device cuda
I'm running from windows WSL2 (ubuntu20)

A very important condition is offline, I want to know if it work well when you offline and put the pre downloaded model to the .tabby dir what is the $HOME/.tabby dir tree , I really do not know where is wrong

wwayne commented 3 months ago

yeah I start the server without the internet, my computer is using 3060TI, but I suppose that it has nothing to do with the GPU. Not sure if you need to double check the content of tabby.json?

lcwxz1989 commented 3 months ago

yeah I start the server without the internet, my computer is using 3060TI, but I suppose that it has nothing to do with the GPU. Not sure if you need to double check the content of tabby.json? My tabb.json is like this, and I do not know where is wrong, in our project, there is no doc to describe how to install offline, nor the commond support verbose option, so I do not know how to diagnosis will is wrong

wwayne commented 3 months ago

The tabby.json looks correct If you are using Docker, you might want to try this method. https://tabby.tabbyml.com/blog/2024/03/25/deploy-tabby-in-air-gapped-environment-with-docker/

lcwxz1989 commented 3 months ago

The tabby.json looks correct If you are using Docker, you might want to try this method. https://tabby.tabbyml.com/blog/2024/03/25/deploy-tabby-in-air-gapped-environment-with-docker/

Yes, but I do not have ubuntu online, so I download tabby in macos then run tabby to download the model.gguf then transfer the model.gguf to the offline ubuntu /data/models/ dir

wwayne commented 3 months ago

Not sure if it's due to the different environment. Have you tried the Docker method described in the post?

lcwxz1989 commented 3 months ago

Not sure if it's due to the different environment. Have you tried the Docker method described in the post?

I do not have ubuntu env, and I think the post only download the model.gguf into the dir, so I download the model.guff in macos and copy the model.guff into dir of docker

lcwxz1989 commented 3 months ago

docker run -it --gpus all \ -p 8080:8080 -v $HOME/.tabby:/data \ tabbyml/tabby serve --model DeepseekCoder-1.3B --device cuda

I do not know what the dir tree in the docker, can you give me the path where the gguf located

wwayne commented 3 months ago

You can find them here in the docker docker

wwayne commented 3 months ago

Closing stale issue

TabbyML / tabby

Offline install Model is always loading #2402

docker run -it --gpus all \ -p 8080:8080 -v $HOME/.tabby:/data \ tabbyml/tabby serve --model DeepseekCoder-1.3B --device cuda