In GQA, only one copy of kv cache will be saved for each group, but snapKV saves kv cache with num_key_value_heads * num_key_value_groups heads. Indeed in kv cache eviction, the choice might be different for kv cache in the same group, but it increases memory cost by num_key_value_groups. Is there a way we can solve this?
In GQA, only one copy of kv cache will be saved for each group, but snapKV saves kv cache with
num_key_value_heads * num_key_value_groups
heads. Indeed in kv cache eviction, the choice might be different for kv cache in the same group, but it increases memory cost bynum_key_value_groups
. Is there a way we can solve this?