Y-IAB / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
0 stars 0 forks source link

Add Bartscore metric for summarization and fix LLM eval #8

Closed myeongho-jeong-yanolja closed 6 months ago

myeongho-jeong-yanolja commented 6 months ago
  1. Add Bartscore metric - this metrics calculates generation probability, and use (-1 x loss) as scores.
    • For summarization scenario, I calculate score between source text and summary text, so named as BARTScore-src
  2. For uptrain>0.5.0, there is an issue for evaulating korean languages, so I roll back to 0.5.0 and refine score calculation according to its output.