Straight through estimator

eladhoffer / quantized.pytorch

MIT License

212 stars 57 forks source link

Open michaelklachko opened 6 years ago

michaelklachko commented 6 years ago

I noticed that you don't cancel gradient of the large values, when using straight through estimator here.

In QNN paper it was claimed "Not cancelling the gradient when r is too large significantly worsens performance".

Does it only matter for low precision quantization (e.g. binary?)