A question about attention block sparsity

Hi, thanks for your excellent work! I'm quite interested in your approach to handling missing KV cache in attention block sparsity. However, when I read the code for the accuracy benchmark, I didn't find any code related to missing KV cache. It seems that the code only modifies the output of self-attention without touching KV cache. I'm wondering if there's any code that hasn't been released, or perhaps I didn't find the right code for the accuracy benchmark.

FMInference / DejaVu

A question about attention block sparsity #23