FasterDecoding SnapKV issues

FasterDecoding / SnapKV

139 stars 4 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Closed issue

#20 JulietLJY closed 1 day ago
0
Could you provide the code for visualization the Hit Rate?

#19 Dominic789654 opened 4 days ago
0
Can snapkv compress kv in case different user questions are posed towards the same context?

#18 namespace-Pt opened 1 week ago
0
observation window size and consistency between layers

#17 Cooperx521 opened 2 weeks ago
0
Question on GQA implementation

#16 cyLi-Tiger opened 2 weeks ago
1
Can I use the SnapKV without the flash-attention ?

#15 pengshuang closed 2 weeks ago
1
What prompt was used in Needle in a Haystack test?

#14 66RING opened 3 weeks ago
0
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min) RuntimeError: The size of tensor a (3509) must match the size of tensor b (7017) at non-singleton dimension 3

#13 seeyourcell closed 1 month ago
3
Can't not run longbench!

#12 HarryWu99 opened 1 month ago
0
why only decode do compress?

#11 CSEEduanyu opened 1 month ago
0
Only kv is compressed. Is the size of Q and K inconsistent when attention is calculated?

#10 CSEEduanyu closed 1 month ago
1
It seems that snapkv need to be able to do "prefill" at least once before the prompt can be compressed.

#9 66RING closed 2 months ago
1
Observation

#8 leeyeehoo closed 2 months ago
0
yl: remove unnessecary

#7 leeyeehoo closed 2 months ago
0
yl: fix a bug

#6 leeyeehoo closed 2 months ago
0
yl: fix typo

#5 leeyeehoo closed 2 months ago
0
Grouped query attention implementation

#4 guozhiyu closed 2 months ago
1
maybe a bug in `update_kv` function

#3 HarryWu99 opened 2 months ago
1
The effect of Clustering via Pooling may be greater？

#2 HarryWu99 opened 2 months ago
1
Questions on paper and code [prompting for mistral, positional index, minor errors & questions in paper]

#1 MarsJacobs opened 2 months ago
8