Hello, I've been experimenting with GPTQ and trying to replicate your LAMBADA zero-shot results. But I have been getting significantly lower accuracy (10-15% lower for OPT specifically) compared to the paper, even for the FP16 baseline. I'm using your pipeline based on LM evaluation harness. I was wondering if you have seen this before?
Hello, I've been experimenting with GPTQ and trying to replicate your LAMBADA zero-shot results. But I have been getting significantly lower accuracy (10-15% lower for OPT specifically) compared to the paper, even for the FP16 baseline. I'm using your pipeline based on LM evaluation harness. I was wondering if you have seen this before?