deepseek chat_format template

mjwweb commented 7 months ago

Is your feature request related to a problem? Please describe. Request for a deepseek chat_format template

Additional context Deepseek prompt template found here: https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct-GGUF

You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:

tastypear commented 7 months ago

Perhaps the chat_format templates should be stored in a configuration file.

In this way, users can edit the .conf to adapt to any model without changing the code.

tastypear commented 7 months ago

@mjwweb Maybe you can try the chat_format alpaca or snoozy, they look similar.

mjwweb commented 7 months ago

@tastypear alpaca seems to be the closest but still doesn't work well. The model outputs a bunch of gibberish.

I found this template for deepseek on hugging face to refer to:

prompt = "Tell me about AI"
prompt_template = f'''You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
{prompt}
### Response:
'''

and I've tried to create my own definintion in llama_chat_format.py: (for reference) https://github.com/abetlen/llama-cpp-python/blob/f3b844ed0a139fc5799d6e515e9d1d063c311f97/llama_cpp/llama_chat_format.py

@register_chat_format("deepseek")
def format_deepseek(
    messages: List[llama_types.ChatCompletionRequestMessage],
    **kwargs: Any,
) -> ChatFormatterResponse:
    _roles = dict(user="### Instruction:", assistant="### Assistant:")
    _sep = "\n"
    _system_message = f'''You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.'''
    _messages = _map_roles(messages, _roles)
    _messages.append((_roles["assistant"], None))
    _prompt = _format_add_colon_single(_system_message, _messages, _sep)
    return ChatFormatterResponse(prompt=_prompt)

and applying the chat_format:

llm = Llama(
    model_path="./models/deepseek-llm-7b-base.Q8_0.gguf",
    chat_format="deepseek",
    n_ctx=2096,
    n_threads=6,
    n_gpu_layers=100
)

however I am getting this error thrown when trying to run inference:

  File "/home/ubuntu/LLMs/ai/llm-env/lib/python3.10/site-packages/llama_cpp/llama_chat_format.py", line 61, in get_chat_completion_handler
    return CHAT_HANDLERS[name]
KeyError: 'deepseek'

Where the error is occuring in llama_chat_format.py:

def get_chat_completion_handler(name: str) -> LlamaChatCompletionHandler:
    return CHAT_HANDLERS[name]

I'm not sure if the new definition I added is being applied to the pip package, or if there is some other issue. I'm new to working in python environments so this all experimental. I have other models working as expected with the existing chat_format templates but am still trying to figure out how to create new definitions to register new templates.

tastypear commented 7 months ago

@mjwweb I have added --chat-format to the llama.cpp/examples/server/api_like_OAI.py(not llama.cpp-python) --> my repo You can edit chat-format.toml to add your template.

For deepseek-coder-instruct, maybe like this:

[deepseek-coder-instruct]
prefix = "You are an AI programming assistant.\n"
system = ""
user = "### Instruction:{content}\n"
assistant = "### Response:{content}\n"
suffix = "### Response:"

then run: ./server -m model.gguf and python api_like_OAI.py --chat-format deepseek-coder-instruct

I haven't tested this template myself, but it should be easy to adjust to make it work.

abetlen / llama-cpp-python

deepseek chat_format template #969