Ques: Are ontology scores correlated with model performance on OpenLLM Leaderboard?

Background

We are curious to know whether ontology score correlates to performance on downstream tasks.

We could evaluate performance on downstream tasks ourselves, but as a first approximation, we could use results that have already been obtained for the models.

Open LLM Leaderboard reports model performance on a variety of tasks, and has results for Pythia models. Let's see whether there is any correlation between ontology score and Open LLM Leaderboard performance.

Tasks

[ ] #182
[ ] Task: Outline experimental methods for ontology scores vs. downstream performance

g-simmons / persona-research-internship

Ques: Are ontology scores correlated with model performance on OpenLLM Leaderboard? #157

Background

Tasks