Open HolmesDomain opened 1 year ago
Got it running by using the .bin file from here: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/main
Had no luck generating the q5_1 from here (via the instructions): https://github.com/ggerganov/llama.cpp#prepare-data--run
If this is a common problem maybe you can point people in the direction of just doing a direct download from TheBloke.
I am getting this error:
My index.js:
It worked before I quantized, but I am hoping quantization makes it faster because it is so slow right now (I'm assuming this would have fixed the speed).