Add Bartscore metric for summarization and fix LLM eval

Y-IAB / lm-evaluation-harness

A framework for few-shot evaluation of language models.

https://www.eleuther.ai

MIT License

0 stars 0 forks source link

Closed myeongho-jeong-yanolja closed 6 months ago

myeongho-jeong-yanolja commented 6 months ago

Add Bartscore metric - this metrics calculates generation probability, and use (-1 x loss) as scores.
- For summarization scenario, I calculate score between source text and summary text, so named as BARTScore-src
For uptrain>0.5.0, there is an issue for evaulating korean languages, so I roll back to 0.5.0 and refine score calculation according to its output.