Closed Deepire closed 1 year ago
I'm not sure if that lora is compatible. Can you try removing --lora /content/koboldcpp/qlora
and try again?
Status Legend: (OK):download completed. Welcome to KoboldCpp - Version 1.30.3 Warning: OpenBLAS library file not found. Non-BLAS library will be used. Initializing dynamic library: koboldcpp.so Loading model: /content/koboldcpp/models/gpt4-x-alpaca-13b-native-4bit-128g-cuda/ggml-model-q8_0.bin [Threads: 1, BlasThreads: 1, SmartContext: False]
Identified as LLAMA model: (ver 0) Attempting to Load... System Info: AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Unknown Model, cannot load. Load Model OK: False Could not load model: /content/koboldcpp/models/gpt4-x-alpaca-13b-native-4bit-128g-cuda/ggml-model-q8_0.bin
I'm not sure if that lora is compatible. Can you try removing
--lora /content/koboldcpp/qlora
and try again?
same story
May be q8_0
quantization format is not supported? I have no issues running q5_1
, q5_K_M
, and q5_K_S
currently. The formats q4_0
and q4_1
are relatively old, so they most likely will work, too. I used to run this specific model (gpt4-x-alpaca-13b-native) in q4_0
and q4_1
on older versions of koboldcpp.
There's an overview of quantization formats here (see "Explanation of the new k-quant methods").
In terms of accuracy and resource usage q5_1
> q5_0
> q4_1
> q4_0
. The K_M
and K_S
versions I can't tell for sure.
Note (regarding q8_0
):
Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users.
May be
q8_0
quantization format is not supported? I have no issues runningq5_1
,q5_K_M
, andq5_K_S
currently. The formatsq4_0
andq4_1
are quite old, so they most likely will work, too. I used to run this specific model (gpt4-x-alpaca-13b-native) inq4_0
andq4_1
on older versions of koboldcpp.There's an overview of quantization formats here (see "Explanation of the new k-quant methods").
In terms of accuracy and resource usage
q5_1
>q5_0
>q4_1
>q4_0
. TheK_M
andK_S
versions I can't tell for sure.Note (regarding
q8_0
):Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users.
Status Legend: (OK):download completed. Welcome to KoboldCpp - Version 1.30.3 Warning: OpenBLAS library file not found. Non-BLAS library will be used. Initializing dynamic library: koboldcpp.so Loading model: /content/koboldcpp/models/gpt4-x-alpaca-13b-native-4bit-128g-cuda/ggml-model-q4_0.bin [Threads: 1, BlasThreads: 1, SmartContext: False]
Identified as LLAMA model: (ver 0) Attempting to Load...
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Unknown Model, cannot load. Load Model OK: False Could not load model: /content/koboldcpp/models/gpt4-x-alpaca-13b-native-4bit-128g-cuda/ggml-model-q4_0.bin
I've been trying to run Alpaca on Kobold for four days now, and still no success.
Is the file size correct?
If you have unreliable internet connection, the download could have interrupted early, leaving you with only a part of the file. It could be that the header (identifying the model type as LLAMA model: (ver 0)
) was in the beginning of the file, but the end of the file is missing.
Note: the screenshot is just an example, check the page from which you downloaded the model to see the correct size.
Is the file size correct?
If you have unreliable internet connection, the download could have interrupted early, leaving you with only a part of the file. It could be that the header (identifying the model type as
LLAMA model: (ver 0)
) was in the beginning of the file, but the end of the file is missing.Note: the screenshot is just an example, check the page from which you downloaded the model to see the correct size.
Thanks it helps, by the way it is much slower then just llama with webUI, why?
Are you using the same settings? (number of CPU threads, CLBlast, number of gpu offload layers)
Are you using the same settings? (number of CPU threads, CLBlast, number of gpu offload layers)
yep, same, i even tested with adding --gpulayers 12 but nothing changes, it was insanely slow
If you want gpu acceleration, try adding --useclblast (platform) (device)
I am currently trying to load Alpaca via Koboldcpp on Google Colab. This is my code in Colab: %cd /content !git clone https://github.com/LostRuins/koboldcpp.git %cd koboldcpp !git clone https://github.com/artidoro/qlora %cd /content/koboldcpp !bash install_requirements.sh "CUDA" !make LLAMA_CLBLAST=1 !pip install -r requirements.txt !sudo apt-get install libclblast-dev libopenblas-dev !mkdir /content/koboldcpp/models !apt-get -y install -qq aria2 !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/Pi3141/gpt4-x-alpaca-native-13B-ggml/blob/main/ggml-model-q8_0.bin -d /content/koboldcpp/models/gpt4-x-alpaca-13b-native-4bit-128g-cuda -o ggml-model-q8_0.bin
!python koboldcpp.py --lora /content/koboldcpp/qlora --model models/gpt4-x-alpaca-13b-native-4bit-128g-cuda/ggml-model-q8_0.bin
I get this: Unknown Model, cannot load. Load Model OK: False
Tell me please what am i doing wrong?