Closed BugReporterZ closed 1 year ago
Do you converted the adapter to ggml using the script?
I managed to make this work some days ago.
It looks like I forgot to do that; I thought there would be an automatic conversion process.
After I ran the script, I didn't get the same error anymore and the LoRA could seemingly be loaded. However, results are different than those expected and observed with other backends; it doesn't seem like the LoRA is working at all. This could be due to other reasons that are probably out of the scope of this issue.
It looks like I forgot to do that; I thought there would be an automatic conversion process.
After I ran the script, I didn't get the same error anymore and the LoRA could seemingly be loaded. However, results are different than those expected and observed with other backends; it doesn't seem like the LoRA is working at all. This could be due to other reasons that are probably out of the scope of this issue.
There is an alert saying that using the ggml Lora with less than f16 degrades it's effectiveness. I'm not sure how much it's effect.
More than just degraded it appears to be non-functional, but this probably needs more investigation and comparisons. The LoRA did seem to work more or less as expected with a 4-bit GPTQ quantized model—I don't know to what extent degradation would take place with with a GGML version.
It appears that this LoRA adapter, which works with regular
transformers
and AutoGPTQ in backends liketext-generation-webui
, has issues getting loaded with KoboldCPP. The base model is supposed to be Llama2 7B (the model was tested to indeed work on its own in KoboldCPP):https://huggingface.co/lemonilia/limarp-llama2/tree/main/LIMARP-Llama2-LoRA-adapter-7B
Although there aren't a lot of details in the console from KoboldCPP, my suspicion is that it has to do with the fact that it is a QLoRA-made LoRA adapter and that it was saved in BF16 format instead of the more conventional FP16. A similar problem has been reported for Exllama. Could it be the case here?