Coefficient Reduction - Githubissues

rainyBJ commented 3 years ago

Hi, there! Since every conv operation is followed by a Relu1 function, which can already guarantee the input values to the next layer are from the interval [0:1], I wonder if it is necessary to have the coefficient reduction process. Hoping to have your reply!

ZFTurbo commented 3 years ago

As I remember I keep coefficeints "as is" in this code. But they overflow 7 bits over the 1.0 point.

Some note: quantization method used in this project not really optimal. It's better to use "Symmetric Fine Grained Quantization" which can be found in NVIDIA docs: https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9659-inference-at-reduced-precision-on-gpus.pdf

rainyBJ commented 3 years ago

Thanks a lot for your reply! I've read the NVIDIA doc you mentioned. Symmetric Fine Grained Quantization seems to be a method used in 2 different circumstances, for activations(feature map, the output for each layer), finding the scale factor for each tensor. for weight parameter, finding the scale factor for each channel.

However, I'm afraid that using this fine grained granularity may make it difficult for the hardware operations. When using 8 conv modules in Verilog computation, each one corresponding to an output-channel in d-conv/conv. Since the scale factor for each channel is different, making it hard to represent the Multiply And aCcumulate results in a uniform manner. This may also lead to more control signals and a more complicated control logic.

So I'm wondering how to balance the tradeoff between the shorter bit length & more complex control logic. Hoping to have your reply, best regards!

ZFTurbo commented 3 years ago

Actually you use the same conv operations. The only difference that you need to requantize to new scale after layer calculation complete. But it's just single multiplication and shift. In current quantization method we wasn't able to run model with 8-bit, but in SFGQ it's possible almost without loss of accuracy. In current method we use 12-13 bits for activations and 19-20 bit for weights. It's rather expensive.

ZFTurbo / MobileNet-in-FPGA

Coefficient Reduction #1