Hi,
for supportting CNN mode, I modified the GPTQ code as follows:
1, supportting group conv;
2, use symmetric quantization without zero point parameter.
But I found it performance not good on mobilenetv2/mnasnet1_0 models when quantization bits = 4.
Here are my results:
model | FP32 | GPTQ_W4 sym
mbv2 71.88 60.84(84.64%)
mnasnet1_0 73.47 64.71(88.08%)
I saw resnet18/resnet50 quantization result in your paper only, have you tested gptq on mobilenetv2/mnasnet1_0 model?
Hi, for supportting CNN mode, I modified the GPTQ code as follows: 1, supportting group conv; 2, use symmetric quantization without zero point parameter.
But I found it performance not good on mobilenetv2/mnasnet1_0 models when quantization bits = 4. Here are my results: model | FP32 | GPTQ_W4 sym mbv2 71.88 60.84(84.64%) mnasnet1_0 73.47 64.71(88.08%) I saw resnet18/resnet50 quantization result in your paper only, have you tested gptq on mobilenetv2/mnasnet1_0 model?
Looking forward to your reply...