why using quantization for conv layers with leaky activation ?

AlexeyAB / yolo2_light

Light version of convolutional neural network Yolo v3 & v2 for objects detection with a minimum of dependencies (INT8-inference, BIT1-XNOR-inference)

MIT License

302 stars 116 forks source link

why using quantization for conv layers with leaky activation ? #45

Open Thilanka97 opened 5 years ago

Thilanka97 commented 5 years ago

@AlexeyAB Hey, I have a small question. why do you use INT8 only for conv layers which use leaky activations, why not for the linear activation layers? Is there a specific reason or is this the case which gives the best accuracy ? Also you do not use INT8 for the regional layer right ?

Thanks in advance!

AlexeyAB commented 5 years ago

@Thilanka97 Hi, This is just a coincidence :)

The first and last conv-layers should be FP32 to get the best accuracy (mAP). (May be with some modifications we can use the 1st layer with UINT8 / FP16)
Also last conv-layers should be linear, because the next [region] or [yolo] layer uses its own activation (sigmoid, softmax, ...) for different inputs

[region] and [yolo] layers are very sensitive to precision, so we use FP32

Thilanka97 commented 5 years ago

@AlexeyAB do you think that the accuracy can be improved if we use more images during the calibration process?

Also why did you decide to convert the outputs of each layer to 32FP before inputting to the next layer? Is it because converting a INT8 output directly to another INT8(with another multiplier) is not easy to achieve? Also is there any difference in your implementation from the NVIDIA implementation?

Thanks in advance!

AlexeyAB commented 5 years ago

@Thilanka97

do you think that the accuracy can be improved if we use more images during the calibration process?

In most cases, no.

Also why did you decide to convert the outputs of each layer to 32FP before inputting to the next layer?

For using other FP32 layers out of box.

Is it because converting a INT8 output directly to another INT8(with another multiplier) is not easy to achieve?

No.

Also is there any difference in your implementation from the NVIDIA implementation?

I didn't saw source code of nVidia implementation, is there open source code of Tensor RT?

Thilanka97 commented 5 years ago

@AlexeyAB Thank you so much for the reply.

For using other FP32 layers out of box.

what do you mean ? do you mean in yolov3 shortcut layers? if I want to directly convert int8 to int8 w/o the 32fp conversion after each layer before the next layer, would that be possible? ( I am working with tiny yolov2)

I didn't saw source code of nVidia implementation, is there open source code of Tensor RT?

https://devblogs.nvidia.com/int8-inference-autonomous-vehicles-tensorrt/

only this.

AlexeyAB commented 5 years ago

what do you mean ? do you mean in yolov3 shortcut layers?

any layers that isn't yet implemented for INT8: shortcut, yolo, upsample, ... and any new layers

if I want to directly convert int8 to int8 w/o the 32fp conversion after each layer before the next layer, would that be possible? ( I am working with tiny yolov2)

Yes, you should implement it by yourself in the source code

Thilanka97 commented 5 years ago

@AlexeyAB what is the R_MULT value? and why 32 ?

#define RMULT (32) // 4 - 32_

Also you are using 32FP for maxpooling as well right? Is there a specific reason or you use 32FP because you use 32FP in-between conv layers?

Thanks in advance!

Thilanka97 commented 5 years ago

@AlexeyAB I tried to make it work for direct int8 to int8 conversion without converting to 32FP in the middle. Code does not give any errors but it does not show any predictions(detection boxes) on the image(output). And also it does not show any class probabilities in the output terminal. Do you have any idea why this could happen? Please help me.

Thanks in advance!