Question on GQA implementation

FasterDecoding / SnapKV

141 stars 4 forks source link

Question on GQA implementation #16

Open cyLi-Tiger opened 3 weeks ago

cyLi-Tiger commented 3 weeks ago

In GQA, only one copy of kv cache will be saved for each group, but snapKV saves kv cache with num_key_value_heads * num_key_value_groups heads. Indeed in kv cache eviction, the choice might be different for kv cache in the same group, but it increases memory cost by num_key_value_groups. Is there a way we can solve this?

pengshuang commented 3 weeks ago

Same question