LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with a KoboldAI UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.41k stars 319 forks source link

LoRa support #96

Closed horenbergerb closed 1 year ago

horenbergerb commented 1 year ago

There are some new models coming out which are being released in LoRa adapter form (such as this one). Since there is no merge released, the "--lora" argument from llama.cpp is necessary to make use of these.

Are there plans for this at the moment?

Drake-AI commented 1 year ago

You can merge lora into the model: https://github.com/tloen/alpaca-lora/blob/main/export_hf_checkpoint.py

But it would be a nice feature to select a model and a lora.

LostRuins commented 1 year ago

for now, no immediate plans to support LoRa until its more stable and easy to use. Eventually, it may be added as an optional parameter.

horenbergerb commented 1 year ago

So running a LoRa in llama.cpp seems to work for me with

./main --no-mmap -t 6 -b 512 -m ./models/llama-30b/ggml-model-q4_0-ggjt.bin -c 1024 -n 50 -s 4201488 -f ./prompts/prompt.txt 
--lora ./models/SuperCOT-LoRA/ggml-adapter-model.bin

Out of curiosity, I tried to hardwire the loading of the same LoRa in koboldcpp by adding

    printf("Loading lora adapter...\n");
    llama_apply_lora_from_file(ctx, "../llama.cpp/models/SuperCOT-LoRA/ggml-adapter-model.bin", modelname.c_str(), n_threads);

to the llama_load_model function in llama_adapter.cpp (following the example from main.cpp).

Unfortunately, It didn't work. I got

Loading lora adapter...                                                         
llama_apply_lora_from_file_internal: applying lora adapter from '../llama.cpp/models/SuperCOT-LoRA/ggml-adapter-model.bin' - please wait ...                    
llama_apply_lora_from_file_internal: r = 8, alpha = 16, scaling = 2.00          
llama_apply_lora_from_file_internal: loading base model from '/media/captdishwasher/Samshmung/horenbergerb/llama/llama.cpp/models/llama-30b/ggml-model-q4_0-ggjt.bin'                                                                           
llama.cpp: loading model from /media/captdishwasher/Samshmung/horenbergerb/llama/llama.cpp/models/llama-30b/ggml-model-q4_0-ggjt.bin                            
llama_apply_lora_from_file_internal: warning: using a lora adapter with a quantized model may result in poor quality, use a f16 or f32 base model with --lora-base                                                                              
GGML_ASSERT: ggml.c:6418: false                                                 
Aborted (core dumped)                                                          

Were it so easy...

Drake-AI commented 1 year ago

Merge script from https://github.com/tloen/alpaca-lora/blob/main/export_hf_checkpoint.py works well, tested with llama 7B.

LostRuins commented 1 year ago

LORA support has been added as of V1.11, pass it in with --lora

horenbergerb commented 1 year ago

Thanks, it works for me! Really appreciate it!

horenbergerb commented 1 year ago

Noting for posterity: it seems that you have to run --lora with --nommap or else you will get Segmentation fault (core dumped).