EricLBuehler / mistral.rs

Blazingly fast LLM inference.
MIT License
3.37k stars 242 forks source link

Any plan about KV compression algorithm like SnapKV and PyramidKV? #598

Open chenwanqq opened 1 month ago

chenwanqq commented 1 month ago

Hi, I'm wondering if you have any plans regarding kv compression methods like SnapKV and PyramidKV. These methods can reduce the use of memory for KV cache, hence improving availability on low-memory machines. Maybe I can make some contributions to this.

EricLBuehler commented 1 month ago

Hi @chenwanqq, I don't have any plans for this at the moment as I'm focusing on adding new quants (#467, #546) for a bit.

Those techniques sound super interesting, and I'd be happy to merge any contributions for this.

EricLBuehler commented 4 weeks ago

@chenwanqq I just merged GPTQ support!

chenwanqq commented 3 weeks ago

@chenwanqq I just merged GPTQ support!

Sorry, I have intensive job interviews these days, so the progress is kind of slow. I will pick up this part very soon!

EricLBuehler commented 3 weeks ago

@chenwanqq no worries!