[ENHANCEMENT] Add Support for 5-bit quantized models

ItsPi3141 / alpaca-electron

The simplest way to run Alpaca (and other LLaMA-based local LLMs) on your own computer

MIT License

1.29k stars 144 forks source link

[ENHANCEMENT] Add Support for 5-bit quantized models #84

Open TreesPlay opened 1 year ago

TreesPlay commented 1 year ago

Hi, I don't know much about AI. But I've seen a lot models popping up on HuggingFace recently advertising 5-bit quantisation. Here is an example: https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GGML

I can only load q4_0 and q4_1 models. The newer q4_2, q5_0 and q5_1 don't work. Since I recently upgraded my RAM to 64GB to run LLMs on my machine I'd like to be able to use the newer models.

TreesPlay commented 1 year ago

For context I use the latest release. Since it was last updated a month ago I don't know if the latest commits already added support for 5-bit quantisation.

chmodseven commented 1 year ago

I have been using some q5_1 models with no problems after compiling llama.cpp and putting the resulting main.exe in place of Alpaca Electron's chat.exe. You can follow "(OPTIONAL) Building llama.cpp from source" in the README here, although note that for me the second cmake didn't work and should be "cmake --build . --config Release" per the llama.cpp README.