questions about inconsistent evaluation result

bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Other

1.3k stars 211 forks source link

questions about inconsistent evaluation result #392

Open coorful opened 1 year ago

coorful commented 1 year ago

Hi，i have used deepspeed framework to train gpt-117M model. when i evaluate model perfomance on wikitext-103, result by using tasks/eval_harness/evaluate.py vs. first convert checkpoint to megatron format and use tasks/main.py , there exists a large performance gap in PPL... May I ask what is the reason for this phenomenon? @mayank31398