FasterDecoding / SnapKV

141 stars 4 forks source link

The effect of Clustering via Pooling may be greater? #2

Open HarryWu99 opened 2 months ago

HarryWu99 commented 2 months ago

Just a guess.

What will happen if H2O also uses Clustering via Pooling when comparing? It seems that Clustering via Pooling can improve the effectiveness of such drop token methods.

leeyeehoo commented 2 months ago

As we stated in the paper, the generated answers are very query-dependent. So evicting KV during generation may introduce losses of information. Given a high-level example, if a user gives the model a book, the first question is about the first chapter, and the model evicts other parts. The user queries about the last chapter, the model will have very limited knowledge about the answer. Pooling is a very interesting observation since the model will perform perfectly on easier tasks like the original haystack task without pooling. But when you switch to more challenging tasks, the method with pooling is significantly better than the one without pooling.