EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.55k stars 1.74k forks source link

Multiple generations (sequential) per question #2317

Open IntrepidEnki opened 1 week ago

IntrepidEnki commented 1 week ago

Hi lm-eval maintainers! I'm relatively new to using this library, so any pointers will be greatly appreciated.

I am trying to elicit two sequential answers from the model for each question. I want to do this by providing the question, recording an initial answer, then appending that answer to the original question and using that as a second prompt, and recording the second (and final) answer to do some post-processing.

After reading through the task_guide and new_task_guide I did not see anything directly related to my endeavor - is there a way to setup this workflow by modifying the relevant yaml config file?

Or is the preferred method to call the log parser from a custom filter function?

Thank you

baberabb commented 1 week ago

Hi! currently the way to do this is a bit involved and will require two lm_eval calls (and two task yamls):

  1. The first using --predict_only which logs the per sample generations (and docs) to file without computing metrics.
  2. The second yaml which parses that file to a HF dataset using something like:
    dataset_path: json
    dataset_kwargs:
    data_files: /test.jsonl

    and then you should be able to structure your task as normal (the resps and filtered_resps fields hold the model outputs).

We are currently looking at ways of supporting lm judge-like tasks, but its very much a work in progress.