felipemaiapolo / tinyBenchmarks

Evaluating LLMs with fewer examples
MIT License
107 stars 11 forks source link

Release of benchmarks data #4

Closed petroskarypis closed 4 days ago

petroskarypis commented 3 months ago

Hi, thanks for making this interesting work open-source! Are you guys planning to release the collection of model benchmarks described in Appendix C?

LucWeber commented 3 months ago

Hey,

the benchmarks in Appendix C are publically available benchmarks (see open LLM leaderboard and HELM).

If you are looking for the tinyBenchmarks, you can find them on here.

Hope this helps!

petroskarypis commented 3 months ago

Hey, thanks for the reply!

I was referring to those benchmarks in Appendix C. While you describe the preprocessing steps you used, having your version of them would be useful for reproducibility. For example HELM Lite currently only lists 30 models vs. the 37 mentioned in the appendix.

felipemaiapolo commented 3 months ago

Hi @petroskarypis,

Now we understand what you mean. We will release the datasets as soon as possible in our GitHub repo. If you want I can send you via email in the meantime. Just email felipemaiapolo@gmail.com and I will reply with the datasets.

petroskarypis commented 3 months ago

Thanks! That would be great.