Jzz24 / pytorch_quantization

A pytorch implementation of dorefa quantization
MIT License
112 stars 11 forks source link

Not same with the Paper #3

Open bongjeong opened 4 years ago

bongjeong commented 4 years ago

activation quantization is not same

In the paper : x(real) is in range[0 ~ 1] : clamp(input, 0, 1) then, quantize(x)

In your implementation: clamp(input * 0.1, 0, 1)

Jzz24 commented 4 years ago

dorefa paper says :'Here we assume the output of the previous layer has passed through a bounded activation function h, which ensures r ∈ [0, 1].' But the paper does not specify what a bounded activation h is. I think multiplying activation by 0.1 can reduce the dynamic range of parameters and make the model perform better.I had an internship in megvii and they dealt with activation functions in this way.

bongjeong commented 4 years ago

I think, DoReFa is fully integer calculation on the layers(without first, last layer). multiplying activation by 0.1 is not quantized format, it need floating point calculation on feature map(my guess). how do you think about it?

Jzz24 commented 4 years ago

yes, I think so. in training time, we use simulation quantization, the activation layer inputs is the dequantize result. it's float format.