Add OpenAI batch API feature for cheaper LM judge implementation, along with a few tool scripts to manage batch jobs.
Upgrade lm-eval compatibility. Recent models, like Llama 3.2, are supported. --apply_chat_template option is available with generic lm-eval support.
Push template for Llama and Qwen model families. Note that --chat_template should be general if --apply_chat_template is enabled to prevent double templates.
lm-eval
compatibility. Recent models, like Llama 3.2, are supported.--apply_chat_template
option is available with genericlm-eval
support.--chat_template
should begeneral
if--apply_chat_template
is enabled to prevent double templates.