Open IntrepidEnki opened 1 week ago
Hi! currently the way to do this is a bit involved and will require two lm_eval
calls (and two task yamls):
--predict_only
which logs the per sample generations (and docs) to file without computing metrics.dataset_path: json
dataset_kwargs:
data_files: /test.jsonl
and then you should be able to structure your task as normal (the resps
and filtered_resps
fields hold the model outputs).
We are currently looking at ways of supporting lm judge-like tasks, but its very much a work in progress.
Hi lm-eval maintainers! I'm relatively new to using this library, so any pointers will be greatly appreciated.
I am trying to elicit two sequential answers from the model for each question. I want to do this by providing the question, recording an initial answer, then appending that answer to the original question and using that as a second prompt, and recording the second (and final) answer to do some post-processing.
After reading through the task_guide and new_task_guide I did not see anything directly related to my endeavor - is there a way to setup this workflow by modifying the relevant yaml config file?
Or is the preferred method to call the log parser from a custom filter function?
Thank you