Closed huangyuxiang03 closed 2 weeks ago
@huangyuxiang03 here are the full results. | niah_single_1 | niah_single_2 | niah_single_3 | niah_multikey_1 | niah_multikey_2 | niah_multikey_3 | niah_multivalue | niah_multiquery | vt | fwe | cwe | qa_1 | qa_2 | avg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4k | 100.0 | 100.0 | 100.0 | 96.6 | 99.6 | 99.6 | 92.2 | 98.5 | 97.1 | 85.1 | 91.9 | 83.8 | 54.2 | 92.2 | |
8k | 100.0 | 100.0 | 100.0 | 96.6 | 100.0 | 99.4 | 95.2 | 98.5 | 99.3 | 83.4 | 88.6 | 75.8 | 53.0 | 91.5 | |
16k | 100.0 | 100.0 | 100.0 | 95.4 | 100.0 | 98.6 | 92.8 | 98.0 | 99.3 | 86.7 | 78.7 | 78.8 | 50.2 | 90.7 | |
32k | 100.0 | 100.0 | 100.0 | 94.6 | 99.6 | 97.4 | 84.4 | 96.8 | 99.0 | 86.5 | 52.3 | 75.6 | 50.8 | 87.5 | |
64k | 100.0 | 100.0 | 100.0 | 95.0 | 98.6 | 95.2 | 88.8 | 93.4 | 87.0 | 66.4 | 3.3 | 74.2 | 45.4 | 80.6 | |
128k | 98.6 | 97.8 | 97.8 | 86.4 | 65.2 | 42.0 | 66.4 | 69.1 | 55.0 | 84.1 | 1.8 | 66.6 | 36.2 | 66.7 |
Thank you for providing these results. I have no futher questions and I'm closing this issue.
Hi, Thanks for your great work on evaluating long context ability of LLMs! Also enjoyed your poster presentation at COLM 2024. Could you provide the raw scores of phi3-mini-128k? It seems like this model is not included in the original paper. Thanks.