EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.9k stars 1.84k forks source link

[Question] A way to run multiple evals on multiple models? #2145

Open tanaymeh opened 3 months ago

tanaymeh commented 3 months ago

Hi, I was wondering if there is a way supported by Python API to run multiple eval benchmarks on multiple models (by passing in a list of models and their respective arguments in the model_args argument)?

For example:

results = lm_eval.simple_evaluate(
    model="hf",
    model_args=["pretrained=microsoft/phi-2,trust_remote_code=True", "pretrained=microsoft/phi-3,trust_remote_code=True"],
    tasks=["hellaswag","mmlu_abstract_algebra"],
    log_samples=True,
)

TIA!

guangyaodou commented 2 months ago

Hi,

Do you know how to run mmlu's flan_cot_zeroshot? How should we specify it in the tasks arguement?

Thanks!