issues
search
FasterDecoding
/
SnapKV
139
stars
4
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Closed issue
#20
JulietLJY
closed
1 day ago
0
Could you provide the code for visualization the Hit Rate?
#19
Dominic789654
opened
4 days ago
0
Can snapkv compress kv in case different user questions are posed towards the same context?
#18
namespace-Pt
opened
1 week ago
0
observation window size and consistency between layers
#17
Cooperx521
opened
2 weeks ago
0
Question on GQA implementation
#16
cyLi-Tiger
opened
2 weeks ago
1
Can I use the SnapKV without the flash-attention ?
#15
pengshuang
closed
2 weeks ago
1
What prompt was used in Needle in a Haystack test?
#14
66RING
opened
3 weeks ago
0
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min) RuntimeError: The size of tensor a (3509) must match the size of tensor b (7017) at non-singleton dimension 3
#13
seeyourcell
closed
1 month ago
3
Can't not run longbench!
#12
HarryWu99
opened
1 month ago
0
why only decode do compress?
#11
CSEEduanyu
opened
1 month ago
0
Only kv is compressed. Is the size of Q and K inconsistent when attention is calculated?
#10
CSEEduanyu
closed
1 month ago
1
It seems that snapkv need to be able to do "prefill" at least once before the prompt can be compressed.
#9
66RING
closed
2 months ago
1
Observation
#8
leeyeehoo
closed
2 months ago
0
yl: remove unnessecary
#7
leeyeehoo
closed
2 months ago
0
yl: fix a bug
#6
leeyeehoo
closed
2 months ago
0
yl: fix typo
#5
leeyeehoo
closed
2 months ago
0
Grouped query attention implementation
#4
guozhiyu
closed
2 months ago
1
maybe a bug in `update_kv` function
#3
HarryWu99
opened
2 months ago
1
The effect of Clustering via Pooling may be greater?
#2
HarryWu99
opened
2 months ago
1
Questions on paper and code [prompting for mistral, positional index, minor errors & questions in paper]
#1
MarsJacobs
opened
2 months ago
8