flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.1k stars 98 forks source link

[LoRA] Roadmap of LoRA operators #199

Open yzh119 opened 4 months ago

yzh119 commented 4 months ago
  1. [ ] Reducing the latency of LoRA operators (per lorax feedback, lora operators introduce ~20% overhead).
  2. [ ] Numerical issue of LoRA operators for large batch size.
  3. [ ] Using fp8 tensor cores for LoRA operators.
tgaddair commented 4 months ago

Thanks for filing this issue @yzh119! Happy to help out in any way I can.