We are curious to know whether ontology score correlates to performance on downstream tasks.
We could evaluate performance on downstream tasks ourselves, but as a first approximation, we could use results that have already been obtained for the models.
Open LLM Leaderboard reports model performance on a variety of tasks, and has results for Pythia models. Let's see whether there is any correlation between ontology score and Open LLM Leaderboard performance.
Tasks
[ ] #182
[ ] Task: Outline experimental methods for ontology scores vs. downstream performance
Background
We are curious to know whether ontology score correlates to performance on downstream tasks.
We could evaluate performance on downstream tasks ourselves, but as a first approximation, we could use results that have already been obtained for the models.
Open LLM Leaderboard reports model performance on a variety of tasks, and has results for Pythia models. Let's see whether there is any correlation between ontology score and Open LLM Leaderboard performance.
Tasks