Import the Llama class of llama-cpp-python and the LlamaCppPythonProvider of llama-cpp-agent

from llama_cpp import Llama from llama_cpp_agent.providers import LlamaCppPythonProvider

Create an instance of the Llama class and load the model

llama_model = Llama(r"mistral-7b-instruct-v0.2.Q5_K_S.gguf", n_batch=1024, n_threads=10, n_gpu_layers=40)

Create the provider by passing the Llama class instance to the LlamaCppPythonProvider class

provider = LlamaCppPythonProvider(llama_model)

from llama_cpp_agent import LlamaCppAgent from llama_cpp_agent import MessagesFormatterType

agent = LlamaCppAgent(provider, system_prompt="You are a helpful assistant.", predefined_messages_formatter_type=MessagesFormatterType.CHATML)

agent_output = agent.get_chat_response("Hello, World!")

Stucks here........

I have NVIDIA A6000 GPU and plenty of memory. I have also tried installing llama.cpp from source but still same issue. Any ideas

Maximilian-Winter commented 1 month ago

@fahdmirza I will look into this.

fahdmirza commented 1 month ago

@fahdmirza I will look into this.

Thank you, I am doing a review of this for my channel https://www.youtube.com/@fahdmirza , as it looks so promising. Thanks.

pabl-o-ce commented 6 days ago

Hello @fahdmirza for this you have to install llama-cpp-python with cuda? we have tested now on A100 on HF spaces and work nice I don't understand your problem

pabl-o-ce commented 6 days ago

your code that share here you are using mistral models with CHATML message format I think you have to use MISTRAL for the chat template

pabl-o-ce commented 6 days ago

nevertheless check this we have all setup some examples In HF spaces if wanna make a review for your YouTube channel hehe https://huggingface.co/poscye

pabl-o-ce commented 6 days ago

This would work for you always use the correct chat template for the model

from llama_cpp_agent import LlamaCppAgent
from llama_cpp_agent import MessagesFormatterType

llama_model = Llama(r"mistral-7b-instruct-v0.2.Q5_K_S.gguf", n_batch=1024, n_threads=10, n_gpu_layers=40)
provider = LlamaCppPythonProvider(llama_model)

agent = LlamaCppAgent(provider, system_prompt="You are a helpful assistant.", predefined_messages_formatter_type=MessagesFormatterType.MISTRAL)

agent_output = agent.get_chat_response("Hello, World!")

Maximilian-Winter / llama-cpp-agent

Stuck at output #66

Import the Llama class of llama-cpp-python and the LlamaCppPythonProvider of llama-cpp-agent

Create an instance of the Llama class and load the model

Create the provider by passing the Llama class instance to the LlamaCppPythonProvider class