Hi, @Vahe1994.
It is so kind of you to release such a great work!
I had applied the SpQR with Baichuan-7B (whose network structure is the same as LlaMa-7B, except that the number of tokens in embedding & lm_head layer is twice that of LlaMa-7B) and found that the outlier_threshold should be tuned quite high (i.e., 3.0) to achieve the fraction of outliers (nearly 1%) recommended by your paper. However, after tuning that, the average score on C-Eval val set was dropped drastically, specifically, from 38.5 to 23.0 (following the official evaluated script and used zero-shot).
I will be very appreciated that if you could help us to understand this issue.
Thank you so much!
Hi, @Vahe1994. It is so kind of you to release such a great work! I had applied the SpQR with Baichuan-7B (whose network structure is the same as LlaMa-7B, except that the number of tokens in embedding & lm_head layer is twice that of LlaMa-7B) and found that the
outlier_threshold
should be tuned quite high (i.e., 3.0) to achieve the fraction of outliers (nearly 1%) recommended by your paper. However, after tuning that, the average score on C-Eval val set was dropped drastically, specifically, from 38.5 to 23.0 (following the official evaluated script and used zero-shot). I will be very appreciated that if you could help us to understand this issue. Thank you so much!