observation window size and consistency between layers

Hello :)

Thank you for the brilliant work and for sharing your code. After reading the paper and reviewing the related code, I have the following questions:

Have you conducted experiments related to the observation window size (e.g., sizes ranging from 1 to 64)? How does this impact the hit rates and overall model performance?
In the "layer-wise average hit rate" experiment, the hit rate of the middle layers is significantly lower than that of the shallow and deep layers. Do you know the reason for this?

Thank you for your excellent paper!

FasterDecoding / SnapKV