Multiple generations (sequential) per question

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

MIT License

6.55k stars 1.74k forks source link

Hi lm-eval maintainers! I'm relatively new to using this library, so any pointers will be greatly appreciated.

I am trying to elicit two sequential answers from the model for each question. I want to do this by providing the question, recording an initial answer, then appending that answer to the original question and using that as a second prompt, and recording the second (and final) answer to do some post-processing.

After reading through the task_guide and new_task_guide I did not see anything directly related to my endeavor - is there a way to setup this workflow by modifying the relevant yaml config file?

Or is the preferred method to call the log parser from a custom filter function?

Thank you

Hi! currently the way to do this is a bit involved and will require two lm_eval calls (and two task yamls):

The first using --predict_only which logs the per sample generations (and docs) to file without computing metrics.
The second yaml which parses that file to a HF dataset using something like:
```
dataset_path: json
dataset_kwargs:
data_files: /test.jsonl
```
and then you should be able to structure your task as normal (the resps and filtered_resps fields hold the model outputs).

We are currently looking at ways of supporting lm judge-like tasks, but its very much a work in progress.

EleutherAI / lm-evaluation-harness

Multiple generations (sequential) per question #2317