Closed yxgcsq closed 1 month ago
This is the score I measured
Tasks | niah_single_1 | niah_single_2 | niah_single_3 | niah_multikey_1 | niah_multikey_2 | niah_multikey_3 | niah_multivalue | niah_multiquery | vt | fwe | cwe | qa_1 | qa_2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 99.5 | 100.0 | 100.0 | 98.0 | 100.0 | 88.0 | 70.0 |
These are my results in the 4k test. Can you check the responses in variable tracking and common word extraction? I think maybe the API returns something unexpected.
Thank you thank you
GPT-4-1106-preview What are the scores for each of the 13 tasks in the 4k test