Why is the wikitext-2 ppl calculated in the code lower than the ppl by lm-evaluation-harness?

IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

https://arxiv.org/abs/2210.17323

Apache License 2.0

1.92k stars 153 forks source link

Open Chocolife-96 opened 1 year ago

Chocolife-96 commented 1 year ago

About 50% lower. What causes the difference? Does the ppl calculation method different?