RAISEDAL / RAISEReadingList

This repository contains a reading list of Software Engineering papers and articles!
0 stars 0 forks source link

Paper Review: DyVal2 - Dynamic Evaluation of Large Language Models by Meta Probing Agents #86

Open mehilshah opened 1 month ago

mehilshah commented 1 month ago

Publisher

ICML

Link to The Paper

https://arxiv.org/abs/2402.14865

Name of The Authors

Kaijie Zhu, Jindong Wang, Qinlin Zhao, Ruochen Xu, Xing Xie

Year of Publication

2024

Summary

The paper proposes Meta Probing Agents (MPA), a novel dynamic evaluation protocol for large language models (LLMs) that addresses two major challenges: data contamination in existing benchmarks and lack of interpretability of LLMs' abilities.

MPA takes inspiration from psychometric theory, categorising cognitive abilities into three core areas: language understanding, problem-solving, and domain knowledge. It employs two types of agents:

  1. Probing agents: LLMs that dynamically transform existing evaluation samples into new ones based on principles corresponding to the three cognitive abilities.
  2. Judging agents: LLMs that validate the consistency and correctness of the transformed samples.

This multi-agent approach with judges allows MPA to dynamically generate evaluation benchmarks while maintaining relevance to the original tasks.

Key Findings:

Contributions of The Paper

The core contribution is the psychometrics-inspired MPA protocol, which provides a general, flexible, and dynamic approach to evaluating LLMs while mitigating data contamination. MPA supports diverse tasks, enables multifaceted analysis of core cognitive abilities, and opens new ways to interpret and improve LLM capabilities.

Comments