Closed horenbergerb closed 1 year ago
You can merge lora into the model: https://github.com/tloen/alpaca-lora/blob/main/export_hf_checkpoint.py
But it would be a nice feature to select a model and a lora.
for now, no immediate plans to support LoRa until its more stable and easy to use. Eventually, it may be added as an optional parameter.
So running a LoRa in llama.cpp seems to work for me with
./main --no-mmap -t 6 -b 512 -m ./models/llama-30b/ggml-model-q4_0-ggjt.bin -c 1024 -n 50 -s 4201488 -f ./prompts/prompt.txt
--lora ./models/SuperCOT-LoRA/ggml-adapter-model.bin
Out of curiosity, I tried to hardwire the loading of the same LoRa in koboldcpp by adding
printf("Loading lora adapter...\n");
llama_apply_lora_from_file(ctx, "../llama.cpp/models/SuperCOT-LoRA/ggml-adapter-model.bin", modelname.c_str(), n_threads);
to the llama_load_model
function in llama_adapter.cpp (following the example from main.cpp).
Unfortunately, It didn't work. I got
Loading lora adapter...
llama_apply_lora_from_file_internal: applying lora adapter from '../llama.cpp/models/SuperCOT-LoRA/ggml-adapter-model.bin' - please wait ...
llama_apply_lora_from_file_internal: r = 8, alpha = 16, scaling = 2.00
llama_apply_lora_from_file_internal: loading base model from '/media/captdishwasher/Samshmung/horenbergerb/llama/llama.cpp/models/llama-30b/ggml-model-q4_0-ggjt.bin'
llama.cpp: loading model from /media/captdishwasher/Samshmung/horenbergerb/llama/llama.cpp/models/llama-30b/ggml-model-q4_0-ggjt.bin
llama_apply_lora_from_file_internal: warning: using a lora adapter with a quantized model may result in poor quality, use a f16 or f32 base model with --lora-base
GGML_ASSERT: ggml.c:6418: false
Aborted (core dumped)
Were it so easy...
Merge script from https://github.com/tloen/alpaca-lora/blob/main/export_hf_checkpoint.py works well, tested with llama 7B.
LORA support has been added as of V1.11, pass it in with --lora
Thanks, it works for me! Really appreciate it!
Noting for posterity: it seems that you have to run --lora
with --nommap
or else you will get Segmentation fault (core dumped)
.
There are some new models coming out which are being released in LoRa adapter form (such as this one). Since there is no merge released, the "--lora" argument from llama.cpp is necessary to make use of these.
Are there plans for this at the moment?