benman1 / generative_ai_with_langchain

Build large language model (LLM) apps with Python, ChatGPT and other models. This is the companion repository for the book on generative AI with LangChain.
https://amzn.to/43PuIkQ
MIT License
600 stars 242 forks source link

LLama.cpp model 3B is not available #27

Closed raniat123 closed 6 months ago

raniat123 commented 7 months ago

Chapter 3 explains the method to download the LLama.cpp model weights and tokens. This chapter specified 3B model. I have two question: 1- The available model weights and tokens are for models 7B and up. I can't find 3B in the given choices. Should I download 7B then? 2- After the chapter explained the installation and download methodolgy, it didn't use the Llama.cpp model and went straight to talking about GPT4ALL. Will the LLama.cpp be needed to complete future examples in the book? If it's not needed I will not download it because of the large disk space it needs.

benman1 commented 7 months ago

Hi @raniat123 Apologies about the confusion with llama-cpp and GPT4All. This has become much easier now with Llama-cpp. There's been several major rewrites of the library, but things have stabilized again.

You don't need to use any particular model for the examples, although most of them are tested with GPT-3.5 or GPT-4. I would recommend though to run local models, because 1. it's very useful to know and 2. it's cool.

Let's see how to get this work with llama-cpp. You might need a relatively recent version of llama-cpp, but it's quite uncomplicated. Here's the recipe:

  1. Download a gguf model from somewhere - for example huggingface.
  2. Load the model with llama-cpp
  3. Run it!

When you search for models, most model overview pages on HF explain some of the tradeoffs for choosing between quantized models. Usually, the 5 bit versions hit a good spot, say on this page, perhaps q5_1 would work well.

Loading the model:

from langchain_community.llms import LlamaCpp

llm = LlamaCpp(
    model_path="/Users/raniat/Downloads/gemma-2b.Q5_1.gguf",  # make sure this points to your model file!
    temperature=0.75,
    max_tokens=2000,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

Running the model:

prompt = """
Question: A rap battle between Stephen Colbert and John Oliver
"""
llm.invoke(prompt)