Open xxxyyyzzz12345 opened 1 year ago
the oneflow.nn.FakeQuantization
Module shoule be used together with oneflow.nn.MinMaxObserver or oneflow.nn.MovingAverageMinMaxObserver
For example:
import oneflow as flow
quantization_formula ='google'
quantization_bit = 1
quantization_scheme = 'affine'
# 'symmetric': quantize to signed integer, can only be used when quantization_bit >= 2
# 'affine': quantize to unsigned integer
input1 = flow.rand(20, dtype=flow.float64, requires_grad=True)
min_max_observer = flow.nn.MinMaxObserver(
quantization_formula=quantization_formula,
quantization_bit=quantization_bit,
quantization_scheme=quantization_scheme,
)
(scale, zero_point) = min_max_observer(input1)
mod = flow.nn.FakeQuantization(quantization_formula, quantization_bit, quantization_scheme)
output = mod(input1, scale, zero_point)
output.sum().backward()
print(output)
print(input1.grad)
If you want to do 1 bit quantize aware training, you should set the quantization_scheme=affine
for the unsigned int quantization.
Also the quantize aware training related modules in OneFlow were actual designed for 8 bit quantization, and not full tested for lower bit.
And as for the gradient issue, why always one. It is because we have followed the idea of the paper Quantizing deep convolutional networks for efficient inference: A whitepaper, see section 2.4 for more details.
And as I said, the quantize aware training related functionality of OneFlow are experimental for now, you can have a try in actual training tasks.
We Appreciate Your Feedback 😄
But when the input is outside the range [quant_min,quant_max], shouldn't the gradient be 0.0 instead of 1.0? The following code snippet sets both quant_min and quant_max to 0 and defines the input tensor as a 1-d tensor ranging from -20 to 20 with a step size of 1:
Both the outputs (-1, ..., -1, 0, ..., 0) and gradients (1., 1., 1., ...) seem to be incorrect.
Originally posted by @xxxyyyzzz12345 in https://github.com/Oneflow-Inc/oneflow/issues/8649#issuecomment-1187825063