issues
search
FasterDecoding
/
SnapKV
200
stars
8
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
why not use the last token for kv cache compression
#25
Arist12
opened
14 hours ago
0
Question: is key_state_compressed used for inference?
#24
jq-wei
opened
1 week ago
1
What happens to the total KV length > max-compacity length during response generation?
#23
PengWenChen
opened
1 month ago
1
Group Query Attention
#22
SimJeg
opened
1 month ago
4
Question on H2O experiment reproduction
#21
CUHKSZzxy
opened
3 months ago
0
Closed issue
#20
JulietLJY
closed
5 months ago
0
Could you provide the code for visualization the Hit Rate?
#19
Dominic789654
opened
5 months ago
0
Can snapkv compress kv in case different user questions are posed towards the same context?
#18
namespace-Pt
opened
5 months ago
1
observation window size and consistency between layers
#17
Cooperx521
closed
4 months ago
1
Question on GQA implementation
#16
cyLi-Tiger
opened
5 months ago
1
Can I use the SnapKV without the flash-attention ?
#15
pengshuang
closed
5 months ago
1
What prompt was used in Needle in a Haystack test?
#14
66RING
closed
4 months ago
1
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min) RuntimeError: The size of tensor a (3509) must match the size of tensor b (7017) at non-singleton dimension 3
#13
seeyourcell
closed
6 months ago
4
Can't not run longbench!
#12
HarryWu99
opened
6 months ago
3
why only decode do compress?
#11
CSEEduanyu
opened
6 months ago
0
Only kv is compressed. Is the size of Q and K inconsistent when attention is calculated?
#10
CSEEduanyu
closed
6 months ago
1
It seems that snapkv need to be able to do "prefill" at least once before the prompt can be compressed.
#9
66RING
closed
7 months ago
1
Observation
#8
leeyeehoo
closed
7 months ago
0
yl: remove unnessecary
#7
leeyeehoo
closed
7 months ago
0
yl: fix a bug
#6
leeyeehoo
closed
7 months ago
0
yl: fix typo
#5
leeyeehoo
closed
7 months ago
0
Grouped query attention implementation
#4
guozhiyu
closed
7 months ago
1
maybe a bug in `update_kv` function
#3
HarryWu99
opened
7 months ago
1
The effect of Clustering via Pooling may be greater?
#2
HarryWu99
opened
7 months ago
1
Questions on paper and code [prompting for mistral, positional index, minor errors & questions in paper]
#1
MarsJacobs
opened
7 months ago
8