kba-tmn3 commented 1 week ago

I would like to use bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF, but I have a problem with launching it properly with TabbyML.

I forked a registry and start downloading it correctly, but after downloading tabby is not responding anyway, I tried to wait, but it doesn't work and logs are empty at all.

Additional context Registry https://github.com/kba-tmn3/registry-tabby Command line

docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model kba-tmn3/DeepseekCoder-V2 --device cuda

Source https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct GGUF https://huggingface.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF Accurancy of the model Comparison with other models

Please reply with a 👍 if you want this feature.

wsxiaoys commented 1 week ago

Thanks for the feature request.

I forked a registry and start downloading it correctly, but after downloading tabby is not responding anyway, I tried to wait, but it doesn't work and logs are empty at all.

Could you turn on RUST_LOG=debug RUST_BACKTRACE=1 in your docker environment and share its output?

kba-tmn3 commented 1 week ago

Could you turn on RUST_LOG=debug RUST_BACKTRACE=1 in your docker environment and share its output?

Sorry, but it can not log anything, I added environment variables and it's empty at all, hangs as running status and nothing working.

Command line RUST_LOG=debug RUST_BACKTRACE=1 docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model kba-tmn3/DeepseekCoder-V2 --device cuda

It utilize 98,42% CPU (%) in docker desktop and task manager's hardware monitor looks like this

wsxiaoys commented 1 week ago

to pass environment flag to docker, you need to do something like -e RUST_BACKTRACE=1 - could you try again?

It's very likely the stuck is caused by model loading / computation, though

kba-tmn3 commented 1 week ago

I guess it because of version of llama.cpp tabby using

model using this for quantization https://github.com/ggerganov/llama.cpp/releases/tag/b3166

Logs logs.txt

kba-tmn3 commented 1 week ago

Sorry for external links, but I found some people stuck with the problem on russian forum named Habr: https://habr.com/ru/news/822503/comments/#:~:text=llama.cpp%20unknown%20model%20architecture%3A%20%27deepseek2%27

I attach the link with text of error they stuck (did i receive the same message? idk)

wsxiaoys commented 1 week ago

Right - this means the support of DeepseekCoder v2 in llama.cpp is only added very recently, will try include it in the upcoming 0.13 release

Mizzlr commented 1 week ago

Just for added context, ollama just started support for deepseekcoder v2. See https://github.com/ollama/ollama/releases/tag/v0.1.45

I was wondering the same from tabby.

Thanks again, looking forward to release 0.13

wsxiaoys commented 1 week ago

For context - you can actually connect tabby to ollama by using config.toml based model configuration: https://tabby.tabbyml.com/docs/administration/model/#ollama

f6ra07nk14 commented 6 days ago

For context - you can actually connect tabby to ollama by using config.toml based model configuration: https://tabby.tabbyml.com/docs/administration/model/#ollama

Can you give an example shown how to create tabby server with model configuration by docker. Thank you in advance

LLukas22 commented 3 days ago

@f6ra07nk14

Here is a simple example im currently using to run tabby with an ollama server as it's llm backend to use deepseek-coder-v1 for code completions and deepseek-coder-v2 as a chat model.

config.toml

[model.completion.http]
kind = "ollama/completion"
model_name = "deepseek-coder"
api_endpoint = "http://ollama:11434" # Insert your URL here
prompt_template = "<｜fim▁begin｜>{prefix}<｜fim▁hole｜>{suffix}<｜fim▁end｜>"

[model.chat.http]
kind = "ollama/chat"
model_name = "deepseek-coder-v2"
api_endpoint = "http://ollama:11434" # Insert your URL here

docker-compose.yml

version: '3.5'
services:
  tabby:
    restart: always
    image: ghcr.io/tabbyml/tabby:0.13.0-rc.3
    command: serve 
    volumes:
      - "./tabby:/data"
      - "./config.toml:/data/config.toml"
    ports:
      - 8080:8080

Basically to use the config.toml you have to mount it into the /data/ directory of the container. Contrary to the documentation you also have to provide a model_name in the [model.completion.http] section for the completion model to work.

wsxiaoys commented 3 days ago

Thanks for contributing such an example @LLukas22

Right - since ollama is a backend able to serve multiple models concurrently. If interested, consider make an edit at https://github.com/TabbyML/tabby/edit/main/website/docs/administration/model.md to contribute a PR, thank you!

JohnSmithToYou commented 3 days ago

Hi, @LLukas22, I noticed you're specifying "prompt_template" for Ollama. As far as I know, Ollama expects pure prompt text. It maintains it's own templates in it's modelfiles. Is Tabby ignoring "prompt_template" for Ollama? Otherwise, if Tabby is formatting it's prompts using "prompt_template" and passing that to Ollama, the results won't be correct.

Edit: Oh, the state of prompt templates are still a total mess! Ollama doesn't support FIM in prompt templates yet. See https://github.com/unit-mesh/auto-dev-vscode/issues/61 and https://github.com/ollama/ollama/pull/5207. It looks like CodeGPT is trying to make some Ollama changes https://github.com/carlrobertoh/CodeGPT/pull/510 but they realized llama.cpp can't get it right either https://github.com/carlrobertoh/CodeGPT/pull/510#issuecomment-2096486472. What a mess!

I guess defining "prompt_template" is the only reliable way to implement FIM with Ollama and llama.cpp? @wsxiaoys Does this mean I need to define a blank prompt template in my Ollama .modelfile or is Tabby blanking out the prompt template in the request?

LLukas22 commented 3 days ago

@JohnSmithToYou

As far as i can tell, the prompt from the ollama file only contains the base structure for the prompt e.g.

{{ .System }}
### Instruction:
{{ .Prompt }}
### Response:

Then to perform the fill in the middle (FIM) task, tabby has to format the instruction as a fill in the middle task by applying the prompt_template provided e.g.

<｜fim▁begin｜>{prefix}<｜fim▁hole｜>{suffix}<｜fim▁end｜>

This basically results in the following combined prompt:

{{ .System }}
### Instruction:
<｜fim▁begin｜>{prefix}<｜fim▁hole｜>{suffix}<｜fim▁end｜>
### Response:

Where the prefix and suffix are inserted by tabby. But i could be wrong here since tabby is using the completions endpoint instead of the chat endpoint of ollama to perform code completions, which maybe doesn't apply any template at all from the ollama side 🤔

JohnSmithToYou commented 3 days ago

@LLukas22 Thanks for the response. I dug in deeper and figured out a few things:

FIM is not supported in the instruct version of Deepseek-coder v2

The correct prompt_template supporting FIM for deepseek coder v2 is:

prompt_template = """<｜fim▁begin｜>{prefix}
<｜fim▁hole｜>
{suffix}<｜fim▁end｜>"""

The new lines matter. This also requires a custom Ollama modelfile to contain:


# Ollama does not support FIM prompt templates. Instead we rely on the invoker to implement it.
TEMPLATE {{ .Prompt }}

PARAMETER stop "<｜fim▁begin｜>" PARAMETER stop "<｜fim▁hole｜>" PARAMETER stop "<｜fim▁end｜>"

This allows Tabby's prompt to pass through without anything being added to it. System context is not supported (I read it on the deepseek's github).

3. I used deepseek-coder-instruct v2 in `[model.chat.http]`, but I didn't use a prompt_template. Instead I defined a custom Ollama modelfile:

12#issuecomment-2181637976

TEMPLATE """{{ if .System }}{{ .System }}

{{ end }}{{ if .Prompt }}User: {{ .Prompt }}

{{ end }}Assistant:{{ .Response }}<｜end▁of▁sentence｜>"""

PARAMETER stop "User:" PARAMETER stop "Assistant:" PARAMETER stop "<｜end▁of▁sentence｜>"


_You must leave off the begin▁of▁sentence._

SpeedCrash100 commented 16 hours ago

@LLukas22 I had implemented changing prompt template specified by model here. So there are no need to override this. I mean only in modelfile, you must set prompt_template for FIM.

The debug log from ollama tell that all is ok with that. I used a different model for now:

time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:179 msg="generate handler" prompt="<fim_prefix>def fib(n):\n    <fim_suffix>\n        return fib(n - 1) + fib(n - 2)<fim_middle>"
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:180 msg="generate handler" template="{{ .Prompt }}"
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:181 msg="generate handler" system=""
time=2024-06-30T17:55:19.629Z level=DEBUG source=routes.go:212 msg="generate handler" prompt="<fim_prefix>def fib(n):\n    <fim_suffix>\n        return fib(n - 1) + fib(n - 2)<fim_middle>"

However, stop words is common problem with ollama. starcoder2 have same issue, for example, and creating modelfile is required.

TabbyML / tabby

Request: Deepseek Coder V2 model #2451

Note: https://github.com/deepseek-ai/DeepSeek-Coder-V2/issues/12#issuecomment-2181637976