Open moonlightian opened 1 year ago
Hi~ Thank you for your great works! It seems that GPTQ would lead to significant speedups for end-to-end inference. But after quantizing INT8 BLOOM-7B with GPTQ, I found it twice slower than FP16 model. How could I make it speedup as shown in paper?
Hi~ Thank you for your great works! It seems that GPTQ would lead to significant speedups for end-to-end inference. But after quantizing INT8 BLOOM-7B with GPTQ, I found it twice slower than FP16 model. How could I make it speedup as shown in paper?