[ACL'24 Oral] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
GNU General Public License v3.0
314
stars
13
forks
source link
Where is the open-ened tasks datasets used for GPT-4/GPT-3.5 evalutation #12
Closed
altctrl00 closed 3 months ago
Hi, I wonder where is the "96-question" and "85+96 question" mentioned in Table 4 of your paper?