Zhen-Dong / HAWQ

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
MIT License
406 stars 83 forks source link

Scale Parameter with Gradient #12

Closed thuako closed 3 years ago

thuako commented 3 years ago

Hi, I want to mix your HAWQ-v3 and QNN which implement custom gradient in scale parameters, like PACT, QIL, LSQ.

I wonder if why didn't you tried to those scale paramter with gradient.

Is there any problem with training? or something else?

I would appreciate for you reply.

Zhen-Dong commented 3 years ago

Hi, thanks a lot for your interest. We use the standard quantizer without optimizing the quantization range (aka the scale parameter), because we think it would be more general in terms of the algorithm. Otherwise, it's hard to tell whether the accuracy improvement comes from our method or the clipping/learnable quantizer methods. Since these methods are orthogonal, I think combining them would not cause problems. Though the gain by adding gradient-based clipping methods may not be significant since we are using 4/8-bit, which are larger than binary/ternary where a wise quantization range is the key. Hope these are helpful.

thuako commented 3 years ago

Thank you for reply :)

gihwan-kim commented 5 months ago

@Zhen-Dong

Hi, thanks a lot for your interest. We use the standard quantizer without optimizing the quantization range (aka the scale parameter), because we think it would be more general in terms of the algorithm. Otherwise, it's hard to tell whether the accuracy improvement comes from our method or the clipping/learnable quantizer methods. Since these methods are orthogonal, I think combining them would not cause problems. Though the gain by adding gradient-based clipping methods may not be significant since we are using 4/8-bit, which are larger than binary/ternary where a wise quantization range is the key. Hope these are helpful.

Does it ok to use the same scaling factor even if input data is different? I can't find calculating scaling factor code in quantized model. It only use fixed scaling factors.

I think your method is using fixed scaling factor of input data and input of layers in tvm relay code But, if it uses fixed scaling factor of input data. I think it will effect bad accuracy or same result of inference.