Open TreesPlay opened 1 year ago
For context I use the latest release. Since it was last updated a month ago I don't know if the latest commits already added support for 5-bit quantisation.
I have been using some q5_1 models with no problems after compiling llama.cpp and putting the resulting main.exe in place of Alpaca Electron's chat.exe. You can follow "(OPTIONAL) Building llama.cpp from source" in the README here, although note that for me the second cmake didn't work and should be "cmake --build . --config Release" per the llama.cpp README.
Hi, I don't know much about AI. But I've seen a lot models popping up on HuggingFace recently advertising 5-bit quantisation. Here is an example: https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GGML
I can only load q4_0 and q4_1 models. The newer q4_2, q5_0 and q5_1 don't work. Since I recently upgraded my RAM to 64GB to run LLMs on my machine I'd like to be able to use the newer models.