LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.14k stars 353 forks source link

Filed to load lora adapter due to "bad file magic" #337

Closed BugReporterZ closed 1 year ago

BugReporterZ commented 1 year ago

It appears that this LoRA adapter, which works with regular transformers and AutoGPTQ in backends like text-generation-webui, has issues getting loaded with KoboldCPP. The base model is supposed to be Llama2 7B (the model was tested to indeed work on its own in KoboldCPP):

https://huggingface.co/lemonilia/limarp-llama2/tree/main/LIMARP-Llama2-LoRA-adapter-7B

Although there aren't a lot of details in the console from KoboldCPP, my suspicion is that it has to do with the fact that it is a QLoRA-made LoRA adapter and that it was saved in BF16 format instead of the more conventional FP16. A similar problem has been reported for Exllama. Could it be the case here?

[...]
Attempting to apply LORA adapter: /home/anon/bin/text-generation-webui/loras/llamav2
llama_apply_lora_from_file_internal: applying lora adapter from '/home/anon/bin/text-generation-webui/loras/llamav2' - please wait ...
llama_apply_lora_from_file_internal: bad file magic
gpttype_load_model: error: failed to apply lora adapter
Load Model OK: False
Could not load model: /home/anon/Downloads/llama-2-7b.ggmlv3.q4_1.bin
gustrd commented 1 year ago

Do you converted the adapter to ggml using the script?

I managed to make this work some days ago.

BugReporterZ commented 1 year ago

It looks like I forgot to do that; I thought there would be an automatic conversion process.

After I ran the script, I didn't get the same error anymore and the LoRA could seemingly be loaded. However, results are different than those expected and observed with other backends; it doesn't seem like the LoRA is working at all. This could be due to other reasons that are probably out of the scope of this issue.

gustrd commented 1 year ago

It looks like I forgot to do that; I thought there would be an automatic conversion process.

After I ran the script, I didn't get the same error anymore and the LoRA could seemingly be loaded. However, results are different than those expected and observed with other backends; it doesn't seem like the LoRA is working at all. This could be due to other reasons that are probably out of the scope of this issue.

There is an alert saying that using the ggml Lora with less than f16 degrades it's effectiveness. I'm not sure how much it's effect.

BugReporterZ commented 1 year ago

More than just degraded it appears to be non-functional, but this probably needs more investigation and comparisons. The LoRA did seem to work more or less as expected with a 4-bit GPTQ quantized model—I don't know to what extent degradation would take place with with a GGML version.