Y-IAB / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
0 stars 0 forks source link

Every evaluation costs money #14

Closed seungduk-yanolja closed 7 months ago

seungduk-yanolja commented 7 months ago

It would be great if we could group the tasks by whether they require money or not. In short, we need to split tasks by their need for OpenAI API configurations.

myeongho-jeong-yanolja commented 7 months ago

Do you mean you don't want to use gpt based evaluation? I'll make a group for it. It will just remove a metric.

seungduk-yanolja commented 7 months ago

cost money -> use them limited times (evaluate milestones etc) no money -> unlimited

myeongho-jeong-yanolja commented 7 months ago

To avoid using of GPT-4 evaluation, I'll make evaluation group like:

  1. yasum-basic: evaluate without GPT-4 metric
  2. yasum-full: evaluate with GPT-4 metric