Open tanaymeh opened 3 months ago
Hi, I was wondering if there is a way supported by Python API to run multiple eval benchmarks on multiple models (by passing in a list of models and their respective arguments in the model_args argument)?
model_args
For example:
results = lm_eval.simple_evaluate( model="hf", model_args=["pretrained=microsoft/phi-2,trust_remote_code=True", "pretrained=microsoft/phi-3,trust_remote_code=True"], tasks=["hellaswag","mmlu_abstract_algebra"], log_samples=True, )
TIA!
Hi,
Do you know how to run mmlu's flan_cot_zeroshot? How should we specify it in the tasks arguement?
Thanks!
Hi, I was wondering if there is a way supported by Python API to run multiple eval benchmarks on multiple models (by passing in a list of models and their respective arguments in the
model_args
argument)?For example:
TIA!