Tracking: LoRA - Githubissues

jon-chuang commented 1 year ago

Here are some outstanding issues for LoRA:

[x] Base implementation (https://github.com/ggerganov/llama.cpp/pull/820)
[ ] Improve LoRA application time with SIMD (AVX, AVX2) (https://github.com/ggerganov/llama.cpp/issues/956)
[ ] Improve LoRA loading time with MMAP on base model
- [ ] quantizing an MMAPed float16 base model that has had LoRA applied
[ ] Interpolation of weights (start with 1, look into multiple) (https://github.com/ggerganov/llama.cpp/issues/905)
[ ] Export loaded model to binfile (standalone in CLI with LoRA (--export-lora flag); interactively (?)) (https://github.com/ggerganov/llama.cpp/issues/904)
[ ] Investigate extracting LoRA for arbitrary models (see PEFT issue)

captainzero93 commented 1 year ago

really desperate to start uing LoRA, however I use GPTQ-4bit-32g.GGML will this be a problem?

jon-chuang commented 1 year ago

So far, we've seen issues with quality on 4 bit base model. That being said, it has produced reasonable output for me some of the time. It is still under investigation.

bmanturner commented 1 year ago

Would this be a good place to request support for multiple lora adapters sharing a similar base model? See here for inspiration: https://github.com/lm-sys/FastChat/pull/1905

Green-Sky commented 1 year ago

Improve LoRA loading time with MMAP on base model

was done here https://github.com/ggerganov/llama.cpp/pull/2095

also not sure this issue is the right one

github-actions[bot] commented 7 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

ggerganov / llama.cpp

Tracking: LoRA #964