FMInference / DejaVu

268 stars 32 forks source link

A question about attention block sparsity #23

Open imh966 opened 5 months ago

imh966 commented 5 months ago

Hi, thanks for your excellent work! I'm quite interested in your approach to handling missing KV cache in attention block sparsity. However, when I read the code for the accuracy benchmark, I didn't find any code related to missing KV cache. It seems that the code only modifies the output of self-attention without touching KV cache. I'm wondering if there's any code that hasn't been released, or perhaps I didn't find the right code for the accuracy benchmark.