felipemaiapolo / tinyBenchmarks

Evaluating LLMs with fewer examples
MIT License
107 stars 11 forks source link

How to specify model to test #5

Closed ehartford closed 4 days ago

ehartford commented 3 months ago

I saw this code

import numpy as np
import tinyBenchmarks as tb

### Parameters
benchmark = 'lb' # choose from possible benchmarks in
                 # ['lb','mmlu','alpaca','helm_lite','truthfulqa',
                 #  'gsm8k', 'winogrande', 'arc', 'hellaswag']

y = np.random.binomial(1,.5, 600) # dummy data (unidimensional numpy array)
                                  # In this example, y has dimension 600 because we
                                  # observe 100 examples from each Open LLM Leaderboard scenario)

### Evaluation
tb.evaluate(y, benchmark)

But in that code, I don't see anywhere to specify which model to test? How can I test a model?

LucWeber commented 3 months ago

Hey Eric,

this code is meant to calculate the IRT-model estimate on the rest of a benchmark after evaluating your model of choice on the corresponding tinyBenchmark. This step will improve the accuracy of the tinyBenchmark result without having to evaluate your model on the whole benchmark.

In the code above, you have to replace np.random.binomial(1,.5, 600) with the score vector of your model.