Open kba-tmn3 opened 1 week ago
Thanks for the feature request.
I forked a registry and start downloading it correctly, but after downloading tabby is not responding anyway, I tried to wait, but it doesn't work and logs are empty at all.
Could you turn on RUST_LOG=debug RUST_BACKTRACE=1
in your docker environment and share its output?
Could you turn on
RUST_LOG=debug RUST_BACKTRACE=1
in your docker environment and share its output?
Sorry, but it can not log anything, I added environment variables and it's empty at all, hangs as running status and nothing working.
Command line RUST_LOG=debug RUST_BACKTRACE=1 docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model kba-tmn3/DeepseekCoder-V2 --device cuda
It utilize 98,42% CPU (%) in docker desktop and task manager's hardware monitor looks like this
to pass environment flag to docker, you need to do something like -e RUST_BACKTRACE=1
- could you try again?
It's very likely the stuck is caused by model loading / computation, though
I guess it because of version of llama.cpp tabby using
model using this for quantization https://github.com/ggerganov/llama.cpp/releases/tag/b3166
Logs logs.txt
Sorry for external links, but I found some people stuck with the problem on russian forum named Habr: https://habr.com/ru/news/822503/comments/#:~:text=llama.cpp%20unknown%20model%20architecture%3A%20%27deepseek2%27
I attach the link with text of error they stuck (did i receive the same message? idk)
Right - this means the support of DeepseekCoder v2 in llama.cpp is only added very recently, will try include it in the upcoming 0.13 release
Just for added context, ollama just started support for deepseekcoder v2. See https://github.com/ollama/ollama/releases/tag/v0.1.45
I was wondering the same from tabby.
Thanks again, looking forward to release 0.13
For context - you can actually connect tabby to ollama by using config.toml based model configuration: https://tabby.tabbyml.com/docs/administration/model/#ollama
For context - you can actually connect tabby to ollama by using config.toml based model configuration: https://tabby.tabbyml.com/docs/administration/model/#ollama
Can you give an example shown how to create tabby server with model configuration by docker
.
Thank you in advance
@f6ra07nk14
Here is a simple example im currently using to run tabby with an ollama
server as it's llm backend to use deepseek-coder-v1
for code completions and deepseek-coder-v2
as a chat model.
config.toml
[model.completion.http]
kind = "ollama/completion"
model_name = "deepseek-coder"
api_endpoint = "http://ollama:11434" # Insert your URL here
prompt_template = "<ļ½fimābeginļ½>{prefix}<ļ½fimāholeļ½>{suffix}<ļ½fimāendļ½>"
[model.chat.http]
kind = "ollama/chat"
model_name = "deepseek-coder-v2"
api_endpoint = "http://ollama:11434" # Insert your URL here
docker-compose.yml
version: '3.5'
services:
tabby:
restart: always
image: ghcr.io/tabbyml/tabby:0.13.0-rc.3
command: serve
volumes:
- "./tabby:/data"
- "./config.toml:/data/config.toml"
ports:
- 8080:8080
Basically to use the config.toml
you have to mount it into the /data/
directory of the container. Contrary to the documentation you also have to provide a model_name
in the [model.completion.http]
section for the completion model to work.
Thanks for contributing such an example @LLukas22
Right - since ollama is a backend able to serve multiple models concurrently. If interested, consider make an edit at https://github.com/TabbyML/tabby/edit/main/website/docs/administration/model.md to contribute a PR, thank you!
Hi, @LLukas22, I noticed you're specifying "prompt_template" for Ollama. As far as I know, Ollama expects pure prompt text. It maintains it's own templates in it's modelfiles. Is Tabby ignoring "prompt_template" for Ollama? Otherwise, if Tabby is formatting it's prompts using "prompt_template" and passing that to Ollama, the results won't be correct.
Edit: Oh, the state of prompt templates are still a total mess! Ollama doesn't support FIM in prompt templates yet. See https://github.com/unit-mesh/auto-dev-vscode/issues/61 and https://github.com/ollama/ollama/pull/5207. It looks like CodeGPT is trying to make some Ollama changes https://github.com/carlrobertoh/CodeGPT/pull/510 but they realized llama.cpp can't get it right either https://github.com/carlrobertoh/CodeGPT/pull/510#issuecomment-2096486472. What a mess!
I guess defining "prompt_template" is the only reliable way to implement FIM with Ollama and llama.cpp? @wsxiaoys Does this mean I need to define a blank prompt template in my Ollama .modelfile or is Tabby blanking out the prompt template in the request?
@JohnSmithToYou
As far as i can tell, the prompt
from the ollama file only contains the base structure for the prompt e.g.
{{ .System }}
### Instruction:
{{ .Prompt }}
### Response:
Then to perform the fill in the middle (FIM) task, tabby has to format the instruction as a fill in the middle task by applying the prompt_template
provided e.g.
<ļ½fimābeginļ½>{prefix}<ļ½fimāholeļ½>{suffix}<ļ½fimāendļ½>
This basically results in the following combined prompt:
{{ .System }}
### Instruction:
<ļ½fimābeginļ½>{prefix}<ļ½fimāholeļ½>{suffix}<ļ½fimāendļ½>
### Response:
Where the prefix and suffix are inserted by tabby. But i could be wrong here since tabby is using the completions endpoint instead of the chat endpoint of ollama
to perform code completions, which maybe doesn't apply any template at all from the ollama
side š¤
@LLukas22 Thanks for the response. I dug in deeper and figured out a few things:
prompt_template = """<ļ½fimābeginļ½>{prefix}
<ļ½fimāholeļ½>
{suffix}<ļ½fimāendļ½>"""
The new lines matter. This also requires a custom Ollama modelfile to contain:
# Ollama does not support FIM prompt templates. Instead we rely on the invoker to implement it.
TEMPLATE {{ .Prompt }}
PARAMETER stop "<ļ½fimābeginļ½>" PARAMETER stop "<ļ½fimāholeļ½>" PARAMETER stop "<ļ½fimāendļ½>"
This allows Tabby's prompt to pass through without anything being added to it. System context is not supported (I read it on the deepseek's github).
3. I used deepseek-coder-instruct v2 in `[model.chat.http]`, but I didn't use a prompt_template. Instead I defined a custom Ollama modelfile:
TEMPLATE """{{ if .System }}{{ .System }}
{{ end }}{{ if .Prompt }}User: {{ .Prompt }}
{{ end }}Assistant:{{ .Response }}<ļ½endāofāsentenceļ½>"""
PARAMETER stop "User:" PARAMETER stop "Assistant:" PARAMETER stop "<ļ½endāofāsentenceļ½>"
_You must leave off the begināofāsentence._
@LLukas22
I had implemented changing prompt template specified by model here. So there are no need to override this. I mean only in modelfile, you must set prompt_template
for FIM.
The debug log from ollama tell that all is ok with that. I used a different model for now:
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:179 msg="generate handler" prompt="<fim_prefix>def fib(n):\n <fim_suffix>\n return fib(n - 1) + fib(n - 2)<fim_middle>"
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:180 msg="generate handler" template="{{ .Prompt }}"
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:181 msg="generate handler" system=""
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:212 msg="generate handler" prompt="<fim_prefix>def fib(n):\n <fim_suffix>\n return fib(n - 1) + fib(n - 2)<fim_middle>"
However, stop words is common problem with ollama. starcoder2 have same issue, for example, and creating modelfile is required.
I would like to use bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF, but I have a problem with launching it properly with TabbyML.
I forked a registry and start downloading it correctly, but after downloading tabby is not responding anyway, I tried to wait, but it doesn't work and logs are empty at all.
Additional context Registry https://github.com/kba-tmn3/registry-tabby Command line
Source https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct GGUF https://huggingface.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF Accurancy of the model![Comparison with other models](https://github.com/TabbyML/tabby/assets/157360605/819c7c7e-de2e-4e42-a05b-3eca98724027)
Please reply with a š if you want this feature.