BrutalCoding / aub.ai

AubAI brings you on-device gen-AI capabilities, including offline text generation and more, directly within your app.
https://pub.dev/packages/aub_ai
GNU Affero General Public License v3.0
216 stars 18 forks source link

What's the equivalent of `create_chat_completion` in llama-cpp-python #14

Open LondonX opened 8 months ago

LondonX commented 8 months ago

Hi,

Some of the models in Hugeface shows the support of create_chat_completion, but now this plugin seems only support the Simple inference, will Chat Completion be supported in the future version?

https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF

from llama_cpp import Llama

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = Llama(
  model_path="./tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",  # Download the model file first
  n_ctx=2048,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
  n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
)

# Simple inference example
output = llm(
  "<|system|>\n{system_message}</s>\n<|user|>\n{prompt}</s>\n<|assistant|>", # Prompt
  max_tokens=512,  # Generate up to 512 tokens
  stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
  echo=True        # Whether to echo the prompt
)

# Chat Completion API

llm = Llama(model_path="./tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf", chat_format="llama-2")  # Set chat_format according to the model you are using
llm.create_chat_completion(
    messages = [
        {"role": "system", "content": "You are a story writing assistant."},
        {
            "role": "user",
            "content": "Write a story about llamas."
        }
    ]
)
BrutalCoding commented 8 months ago

Hi,

Sorry, I don't have an answer ready yet, just wanted to let you know I've seen your question.

I'd like to say that I'll get back to you soon but I honestly have no idea when.

It's fair to say that I should solve (or answer) the earlier reported issues first before coming back to you with an answer. I hope you understand.

I will update aub_ai to sync with the latest llama.cpp changes this week, mainly due to support Google's new Gemma model. Not sure if aub_ai will have bindings for create_chat_completion, it depends on whether this method is coming from llama.cpp directly (not llama-cpp-python).

Thanks, Daniel