LLama.cpp model 3B is not available

Hi @raniat123 Apologies about the confusion with llama-cpp and GPT4All. This has become much easier now with Llama-cpp. There's been several major rewrites of the library, but things have stabilized again.

You don't need to use any particular model for the examples, although most of them are tested with GPT-3.5 or GPT-4. I would recommend though to run local models, because 1. it's very useful to know and 2. it's cool.

Let's see how to get this work with llama-cpp. You might need a relatively recent version of llama-cpp, but it's quite uncomplicated. Here's the recipe:

Download a gguf model from somewhere - for example huggingface.
Load the model with llama-cpp
Run it!

When you search for models, most model overview pages on HF explain some of the tradeoffs for choosing between quantized models. Usually, the 5 bit versions hit a good spot, say on this page, perhaps q5_1 would work well.

Loading the model:

from langchain_community.llms import LlamaCpp

llm = LlamaCpp(
    model_path="/Users/raniat/Downloads/gemma-2b.Q5_1.gguf",  # make sure this points to your model file!
    temperature=0.75,
    max_tokens=2000,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

Running the model:

prompt = """
Question: A rap battle between Stephen Colbert and John Oliver
"""
llm.invoke(prompt)

benman1 / generative_ai_with_langchain

LLama.cpp model 3B is not available #27