Detailed scores of Phi-3-mini-128k

huangyuxiang03 commented 2 weeks ago

Hi, Thanks for your great work on evaluating long context ability of LLMs! Also enjoyed your poster presentation at COLM 2024. Could you provide the raw scores of phi3-mini-128k? It seems like this model is not included in the original paper. Thanks.

hsiehjackson commented 2 weeks ago

@huangyuxiang03 here are the full results.		niah_single_1	niah_single_2	niah_single_3	niah_multikey_1	niah_multikey_2	niah_multikey_3	niah_multivalue	niah_multiquery	vt	fwe	cwe	qa_1	qa_2
4k	100.0	100.0	100.0	96.6	99.6	99.6	92.2	98.5	97.1	85.1	91.9	83.8	54.2	92.2
8k	100.0	100.0	100.0	96.6	100.0	99.4	95.2	98.5	99.3	83.4	88.6	75.8	53.0	91.5
16k	100.0	100.0	100.0	95.4	100.0	98.6	92.8	98.0	99.3	86.7	78.7	78.8	50.2	90.7
32k	100.0	100.0	100.0	94.6	99.6	97.4	84.4	96.8	99.0	86.5	52.3	75.6	50.8	87.5
64k	100.0	100.0	100.0	95.0	98.6	95.2	88.8	93.4	87.0	66.4	3.3	74.2	45.4	80.6
128k	98.6	97.8	97.8	86.4	65.2	42.0	66.4	69.1	55.0	84.1	1.8	66.6	36.2	66.7

huangyuxiang03 commented 2 weeks ago

Thank you for providing these results. I have no futher questions and I'm closing this issue.

NVIDIA / RULER

Detailed scores of Phi-3-mini-128k #71