[Deepire] Help Please, I am new to it

Deepire commented 1 year ago

I am currently trying to load Alpaca via Koboldcpp on Google Colab. This is my code in Colab: %cd /content !git clone https://github.com/LostRuins/koboldcpp.git %cd koboldcpp !git clone https://github.com/artidoro/qlora %cd /content/koboldcpp !bash install_requirements.sh "CUDA" !make LLAMA_CLBLAST=1 !pip install -r requirements.txt !sudo apt-get install libclblast-dev libopenblas-dev !mkdir /content/koboldcpp/models !apt-get -y install -qq aria2 !aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/Pi3141/gpt4-x-alpaca-native-13B-ggml/blob/main/ggml-model-q8_0.bin -d /content/koboldcpp/models/gpt4-x-alpaca-13b-native-4bit-128g-cuda -o ggml-model-q8_0.bin

!python koboldcpp.py --lora /content/koboldcpp/qlora --model models/gpt4-x-alpaca-13b-native-4bit-128g-cuda/ggml-model-q8_0.bin

I get this: Unknown Model, cannot load. Load Model OK: False

Tell me please what am i doing wrong?

LostRuins commented 1 year ago

I'm not sure if that lora is compatible. Can you try removing --lora /content/koboldcpp/qlora and try again?

Deepire commented 1 year ago

Status Legend: (OK):download completed. Welcome to KoboldCpp - Version 1.30.3 Warning: OpenBLAS library file not found. Non-BLAS library will be used. Initializing dynamic library: koboldcpp.so Loading model: /content/koboldcpp/models/gpt4-x-alpaca-13b-native-4bit-128g-cuda/ggml-model-q8_0.bin [Threads: 1, BlasThreads: 1, SmartContext: False]

Identified as LLAMA model: (ver 0) Attempting to Load... System Info: AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

Unknown Model, cannot load. Load Model OK: False Could not load model: /content/koboldcpp/models/gpt4-x-alpaca-13b-native-4bit-128g-cuda/ggml-model-q8_0.bin

Deepire commented 1 year ago

I'm not sure if that lora is compatible. Can you try removing --lora /content/koboldcpp/qlora and try again?

same story

KosRud commented 1 year ago

May be q8_0 quantization format is not supported? I have no issues running q5_1, q5_K_M, and q5_K_S currently. The formats q4_0 and q4_1 are relatively old, so they most likely will work, too. I used to run this specific model (gpt4-x-alpaca-13b-native) in q4_0 and q4_1 on older versions of koboldcpp.

There's an overview of quantization formats here (see "Explanation of the new k-quant methods").

In terms of accuracy and resource usage q5_1 > q5_0 > q4_1 > q4_0. The K_M and K_S versions I can't tell for sure.

Note (regarding q8_0):

Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users.

Deepire commented 1 year ago

May be q8_0 quantization format is not supported? I have no issues running q5_1, q5_K_M, and q5_K_S currently. The formats q4_0 and q4_1 are quite old, so they most likely will work, too. I used to run this specific model (gpt4-x-alpaca-13b-native) in q4_0 and q4_1 on older versions of koboldcpp.

There's an overview of quantization formats here (see "Explanation of the new k-quant methods").

In terms of accuracy and resource usage q5_1 > q5_0 > q4_1 > q4_0. The K_M and K_S versions I can't tell for sure.

Note (regarding q8_0):

Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users.

Status Legend: (OK):download completed. Welcome to KoboldCpp - Version 1.30.3 Warning: OpenBLAS library file not found. Non-BLAS library will be used. Initializing dynamic library: koboldcpp.so Loading model: /content/koboldcpp/models/gpt4-x-alpaca-13b-native-4bit-128g-cuda/ggml-model-q4_0.bin [Threads: 1, BlasThreads: 1, SmartContext: False]

Identified as LLAMA model: (ver 0) Attempting to Load...

System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

Unknown Model, cannot load. Load Model OK: False Could not load model: /content/koboldcpp/models/gpt4-x-alpaca-13b-native-4bit-128g-cuda/ggml-model-q4_0.bin

I've been trying to run Alpaca on Kobold for four days now, and still no success.

KosRud commented 1 year ago

Is the file size correct?

If you have unreliable internet connection, the download could have interrupted early, leaving you with only a part of the file. It could be that the header (identifying the model type as LLAMA model: (ver 0)) was in the beginning of the file, but the end of the file is missing.

Note: the screenshot is just an example, check the page from which you downloaded the model to see the correct size.

Deepire commented 1 year ago

Is the file size correct?

If you have unreliable internet connection, the download could have interrupted early, leaving you with only a part of the file. It could be that the header (identifying the model type as LLAMA model: (ver 0)) was in the beginning of the file, but the end of the file is missing.

Note: the screenshot is just an example, check the page from which you downloaded the model to see the correct size.

Thanks it helps, by the way it is much slower then just llama with webUI, why?

KosRud commented 1 year ago

Are you using the same settings? (number of CPU threads, CLBlast, number of gpu offload layers)

Deepire commented 1 year ago

Are you using the same settings? (number of CPU threads, CLBlast, number of gpu offload layers)

yep, same, i even tested with adding --gpulayers 12 but nothing changes, it was insanely slow

LostRuins commented 1 year ago

If you want gpu acceleration, try adding --useclblast (platform) (device)

LostRuins / koboldcpp

[Deepire] Help Please, I am new to it #242