OpenGPTX / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.
MIT License
10 stars 8 forks source link

Inspect LAMBADA OpenAI fpr HF converted checkpoints #103

Open lllAlexanderlll opened 10 months ago

lllAlexanderlll commented 10 months ago

The tasks

lambada_openai,lambada_openai_cloze,lambada_openai_mt_de,lambada_openai_mt_en,lambada_openai_mt_es,lambada_openai_mt_fr,lambada_openai_mt_it

result in zero or near zero scores with the HF-converted 7B-EQ model, while the Megatron checkpoint reaches valid scores of 33% to 53%.