Question about H-swish activation and SE layer

LaVieEnRoseSMZ / OQA

28 stars 6 forks source link

Question about H-swish activation and SE layer #2

Open bhpfelix opened 3 years ago

bhpfelix commented 3 years ago

Hi,

Thanks for sharing the exciting work! Two quick questions about the H-swish activation and the SE layer.

H-swish activation is troublesome because of its skewed distribution. Specifically for LSQ, we either use an unsigned quantizer and clip all negative values to zero or use a signed quantizer and waste some of the negative quantization grid. Judging from your implementation of LSQ, you took the former route. Did you observe any impact on performance from clipping the activation?

For the SE layer, I see you left it at full precision. Have you tried to quantize it? If so, how much does it impact the performance?

Thank you!

LaVieEnRoseSMZ commented 3 years ago

Thanks for your appreciation!

We have noticed that quantizing activation into positive numbers with HSwish is not optimal. And there are also papers like LSQ+ investigate this problem. It can be solved with an asymmetric quantization function for activation which improves the final quantization accuracy. However, some previous quantization works also follow the same setting with only positive numbers. Therefore, we did this for fair comparison.

We didn't quantize SE Layers either for the same purpose of fair comparison. We believe that it could decrease the quantization accuracy. However the SE module is lightweight and as a result, the quantization of SE modules may not affect the inference time substantially.

bhpfelix commented 3 years ago

Thanks for the quick response! The clarification helps a lot. One minor follow-up, the point_linear layer in MBConv has a linear activation, which would mean that the input to the next MobileInvertedResidualBlock is likely to be distributed somewhat evenly around 0. Maybe I'm missing something, but in this case, do we still quantize it with an unsigned quantizer?