Help needed with loading 'TheBloke/Mistral-7B-Instruct-v0.1-GGUF' model using `llama-cpp-python`

ElliotBadinger commented 10 months ago

Hello,

I'm currently working on a project that requires the use of the TheBloke/Mistral-7B-Instruct-v0.1-GGUF model, which is in the GGUF format. I've tried using the Hugging Face library to load this model, but it seems that the library does not support the GGUF format.

I've also tried using the ctransformers library, but I've encountered some issues with it as well. Therefore, I'm considering using the llama-cpp-python library instead.

However, I'm having trouble understanding how to use the llama-cpp-python library to load the 'TheBloke/Mistral-7B-Instruct-v0.1-GGUF' model. The documentation for the llama-cpp-python library is not very detailed, and there are no specific examples of how to use this library to load a model from the Hugging Face Model Hub.

I would greatly appreciate if you could provide some guidance on how to use the llama-cpp-python library to load the TheBloke/Mistral-7B-Instruct-v0.1-GGUF model. Specifically, I would like to know how to install the library, how to import it in my Python code, and how to use it to load the model.

Thank you in advance for your help.

tk-master commented 10 months ago

Firstly, Mistral-7B-Instruct-v0.1 isn't only in gguf format, there are other quantized format's you can find from thebloke or you can download the unquantized version from the original mistral repo (on huggingface) and use ctransformers (not recommended since it requires much more gpu vram).

Secondly, to install llama-cpp-python follow the readme , if on windows you might find my guide to enable cuda useful.

Lastly, there are example python scripts to get you started like high_level_api_inference.py

CartierPierre commented 10 months ago

Hi, same here. Just a clean install with llama_cpp_python-0.2.18 (same in 0.2.16 and maybe more), downloaded 4 different GGUF model from huggingface : eg : https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF#provided-files

llm = LlamaCpp(
    model_path="/content/llama-2-7b-chat.Q4_K_M.gguf",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    n_ctx=2048,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    callback_manager=callback_manager,
    verbose=True,
)

returning error

/usr/local/lib/python3.10/dist-packages/pydantic/main.cpython-310-x86_64-linux-gnu.so in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for LlamaCpp
__root__
  Could not load Llama model from path: /content/llama-2-7b-chat.Q4_K_M.gguf. Received error  (type=value_error)

tk-master commented 10 months ago

@CartierPierre seems to me like you might be providing an incorrect relative path to the gguf file(I could be wrong).. why don't you try model_path="llama-2-7b-chat.Q4_K_M.gguf", then put the file in the same working directory as the py script

CartierPierre commented 10 months ago

@tk-master cause I'm using Colab 😉
Hum I could try to rename the file -> Not working.

Edit : Error looks to be in langchain serializer

CartierPierre commented 9 months ago

Ok, I found my error ... I used wget on huggingface, but git LFS just returned me a webpage (with GGUF extension ...), but only few kb. So I manually downloaded it and put it in collab, working well !

antonvice commented 9 months ago

OH MY GOD, I HAVE BEEN bashing my head on the wall all day, thank you @CartierPierre

Li1506 commented 8 months ago

You might need to customize chat template as Mistral 7B uses different format https://docs.mistral.ai/models#chat-template. There are a few examples under llama_cpp/llama_chat_format.py.

PhucTuHa commented 6 months ago

Ok, I found my error ... I used wget on huggingface, but git LFS just returned me a webpage (with GGUF extension ...), but only few kb. So I manually downloaded it and put it in collab, working well !

Can you share how to fix it for me? I sincerely thank you

abetlen / llama-cpp-python

Help needed with loading 'TheBloke/Mistral-7B-Instruct-v0.1-GGUF' model using `llama-cpp-python` #915