Maximilian-Winter / llama-cpp-agent

The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.
Other
437 stars 38 forks source link

Stuck at output #66

Open fahdmirza opened 1 month ago

fahdmirza commented 1 month ago

Hi, On ubuntu 22.04, It just stucks at generating output for hours:

Import the Llama class of llama-cpp-python and the LlamaCppPythonProvider of llama-cpp-agent

from llama_cpp import Llama from llama_cpp_agent.providers import LlamaCppPythonProvider

Create an instance of the Llama class and load the model

llama_model = Llama(r"mistral-7b-instruct-v0.2.Q5_K_S.gguf", n_batch=1024, n_threads=10, n_gpu_layers=40)

Create the provider by passing the Llama class instance to the LlamaCppPythonProvider class

provider = LlamaCppPythonProvider(llama_model)

from llama_cpp_agent import LlamaCppAgent from llama_cpp_agent import MessagesFormatterType

agent = LlamaCppAgent(provider, system_prompt="You are a helpful assistant.", predefined_messages_formatter_type=MessagesFormatterType.CHATML)

agent_output = agent.get_chat_response("Hello, World!")

Stucks here........

I have NVIDIA A6000 GPU and plenty of memory. I have also tried installing llama.cpp from source but still same issue. Any ideas

Maximilian-Winter commented 1 month ago

@fahdmirza I will look into this.

fahdmirza commented 1 month ago

@fahdmirza I will look into this.

Thank you, I am doing a review of this for my channel https://www.youtube.com/@fahdmirza , as it looks so promising. Thanks.

pabl-o-ce commented 6 days ago

Hello @fahdmirza for this you have to install llama-cpp-python with cuda? we have tested now on A100 on HF spaces and work nice I don't understand your problem

pabl-o-ce commented 6 days ago

your code that share here you are using mistral models with CHATML message format I think you have to use MISTRAL for the chat template

pabl-o-ce commented 6 days ago

nevertheless check this we have all setup some examples In HF spaces if wanna make a review for your YouTube channel hehe https://huggingface.co/poscye

pabl-o-ce commented 6 days ago

This would work for you always use the correct chat template for the model

from llama_cpp_agent import LlamaCppAgent
from llama_cpp_agent import MessagesFormatterType

llama_model = Llama(r"mistral-7b-instruct-v0.2.Q5_K_S.gguf", n_batch=1024, n_threads=10, n_gpu_layers=40)
provider = LlamaCppPythonProvider(llama_model)

agent = LlamaCppAgent(provider, system_prompt="You are a helpful assistant.", predefined_messages_formatter_type=MessagesFormatterType.MISTRAL)

agent_output = agent.get_chat_response("Hello, World!")