[LoRA] Roadmap of LoRA operators

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

https://flashinfer.ai

Apache License 2.0

1.1k stars 98 forks source link

Open yzh119 opened 4 months ago

yzh119 commented 4 months ago

[ ] Reducing the latency of LoRA operators (per lorax feedback, lora operators introduce ~20% overhead).
[ ] Numerical issue of LoRA operators for large batch size.
[ ] Using fp8 tensor cores for LoRA operators.

tgaddair commented 4 months ago

Thanks for filing this issue @yzh119! Happy to help out in any way I can.