TexasInstruments / jacinto-ai-devkit

This repository has been moved. The new location is in https://github.com/TexasInstruments/edgeai-tensorlab
https://github.com/TexasInstruments/edgeai
Other
86 stars 19 forks source link

QuantizeDequantizeG's backward() function doesn't be executed!!!! #3

Open wuzhiyang2016 opened 4 years ago

wuzhiyang2016 commented 4 years ago

hello, when i train quantized model by pytorch-jacinto-ai-devkit, a class called QuantizeDequantizeG, when doing loss.backward(), it's backword() function is not used,,, the code is in xnn/layers/funtion.py

Look forward to your favourable reply sincerely ~

mathmanu commented 4 years ago

Sorry, I couldn't understand. Please post the exact error that you are getting and describe the situation in further detail.

mathmanu commented 4 years ago

Hi wuzhiyang2016, Reading your comment carefully, I understood better.

What you are saying is that the backward method of QuantizeDequantizeG is not called during back-propagation (loss.backward()). This is a good question and it shows that you have tried to analyze and understand what is happening. Let me answer in detail:

This is the crux of Straight Through Estimation (STE) - is that backward does not involve any quantization - it goes backward straight through as though no quantization happened in the forward.

We support three kinds of Quant Estimation methods, that you can see as defined in qaunt_base_module.py: class QuantEstimationType: QUANTIZED_THROUGH_ESTIMATION = 0 STRAIGHT_THROUGH_ESTIMATION = 1 ALPHA_BLENDING_ESTIMATION = 2

STRAIGHT_THROUGH_ESTIMATION is default. You can see this being set in quant_train_module.py self.quantized_estimation_type = QuantEstimationType.STRAIGHT_THROUGH_ESTIMATION

If you change the above to QUANTIZED_THROUGH_ESTIMATION, you can see that the backward of QuantizeDequantizeG will be called.

I would like to highlight a couple of limitations: (1) ONNX export may not work if you do the above change - due to what seems like a change in handling of custom/symbolic functions in PyTorch. You can disable ONNX export if you try the above. (2) If you try ALPHA_BLENDING_ESTIMATION and face an assertion, a small fix is required - in the forward function of QuantTrainPAct2, you can change the relevant lines to: elif (self.quantized_estimation_type == QuantEstimationType.ALPHA_BLENDING_ESTIMATION): if self.training:

TODO: vary the alpha blending factor over the epochs

            y = y * (1.0-self.alpha_blending_estimation_factor) + yq * self.alpha_blending_estimation_factor
        else:
            y = yq
        #
    elif (self.quantized_estimation_type == QuantEstimationType.QUANTIZED_THROUGH_ESTIMATION):

I hope this helps. Best regards,

wuzhiyang2016 commented 4 years ago

very thanks, i just see your reply, let me take some time to understand your great reply~

wuzhiyang2016 commented 4 years ago

hello, i have check the class QuantizeDequantizeG it's backward code, dx is the same with the paper's formula (8), but the scale derivative is not the same with the paper's formula (6), could you give me some advice? the paper which i mean is "TRAINED QUANTIZATION THRESHOLDS FOR ACCURATE AND EFFICIENT FIXED-POINT INFERENCE OF DEEP NEURAL NETWORKS "

mathmanu commented 4 years ago

Hi,

The backward of QuantizeDequantizeG has numerical gradient: https://en.wikipedia.org/wiki/Numerical_differentiation It's not the one from that paper.

Also our recommended quantized_estimation_type is STE, in which case this gradient is not used at all.

I hope it is clear.

wuzhiyang2016 commented 4 years ago

i get, it's really clear ! very thanks!!!

mathmanu commented 4 years ago

Keeping this open as an FAQ item, so that others can also benefit.