OpenLMLab / LEval

[ACL'24 Oral] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
GNU General Public License v3.0
314 stars 13 forks source link

Understanding evals #15

Closed karansaxena closed 1 month ago

karansaxena commented 1 month ago

Hi,

From what I understand, we are using 'exact_match' as a metric for CodeU. But if I take a sample eval here https://www.online-python.com/zSKhTRrYFy , I am confused how is the score 100%.

Can you help clarify?

karansaxena commented 1 month ago

Actually I got it. The format is [predictions] and [[ground_truth]]