UMass-Meta-LLM-Eval / llm_eval

A comprehensive study of the LLM-as-a-judge paradigm in a controlled setup that reveals new results about its strengths and weaknesses.
https://arxiv.org/abs/2406.12624
6 stars 1 forks source link

Error Analysis #50

Closed singh96aman closed 4 months ago

singh96aman commented 5 months ago

Similar to Open QA paper, we should do Error Analysis on why the questions LLM are getting wrong. We've already demarcated our categories as under specification, over specification, knowledge error..

image https://arxiv.org/pdf/2305.12421

singh96aman commented 5 months ago

We can show our error analysis in a nice way like this. Thoughts ? image https://proceedings.neurips.cc/paper_files/paper/2023/file/4dbb61cb68671edc4ca3712d70083b9f-Paper-Datasets_and_Benchmarks.pdf

singh96aman commented 4 months ago

Closing this for now