Further concern about oneflow.nn.FakeQuantization

But when the input is outside the range [quant_min,quant_max], shouldn't the gradient be 0.0 instead of 1.0? The following code snippet sets both quant_min and quant_max to 0 and defines the input tensor as a 1-d tensor ranging from -20 to 20 with a step size of 1:

quantization_formula ='google'
quantization_bit = 1
quantization_scheme = 'symmetric'

for x in range(-20,20):
    input1 = oneflow.tensor([x], dtype=oneflow.float64,requires_grad=True)
    input2 = oneflow.tensor([1], dtype=oneflow.float64,requires_grad=True)
    input3 = oneflow.tensor([0], dtype=oneflow.float64,requires_grad=True)
    mod = oneflow.nn.FakeQuantization(quantization_formula, quantization_bit, quantization_scheme)
    output = mod(input1, input2, input3)
    output.backward()
    print(output)
    print(input1.grad)

Both the outputs (-1, ..., -1, 0, ..., 0) and gradients (1., 1., 1., ...) seem to be incorrect.

Originally posted by @xxxyyyzzz12345 in https://github.com/Oneflow-Inc/oneflow/issues/8649#issuecomment-1187825063

the oneflow.nn.FakeQuantization Module shoule be used together with oneflow.nn.MinMaxObserver or oneflow.nn.MovingAverageMinMaxObserver

For example:

import oneflow as flow

quantization_formula ='google'
quantization_bit = 1
quantization_scheme = 'affine' 
# 'symmetric': quantize to signed integer, can only be used when quantization_bit >= 2
# 'affine': quantize to unsigned integer

input1 = flow.rand(20, dtype=flow.float64, requires_grad=True)
min_max_observer = flow.nn.MinMaxObserver(
    quantization_formula=quantization_formula,
    quantization_bit=quantization_bit,
    quantization_scheme=quantization_scheme,
)
(scale, zero_point) = min_max_observer(input1)
mod = flow.nn.FakeQuantization(quantization_formula, quantization_bit, quantization_scheme)
output = mod(input1, scale, zero_point)
output.sum().backward()
print(output)
print(input1.grad)

If you want to do 1 bit quantize aware training, you should set the quantization_scheme=affine for the unsigned int quantization.

Also the quantize aware training related modules in OneFlow were actual designed for 8 bit quantization, and not full tested for lower bit.

And as for the gradient issue, why always one. It is because we have followed the idea of the paper Quantizing deep convolutional networks for efficient inference: A whitepaper, see section 2.4 for more details.

And as I said, the quantize aware training related functionality of OneFlow are experimental for now, you can have a try in actual training tasks.

We Appreciate Your Feedback 😄

Oneflow-Inc / oneflow

Further concern about oneflow.nn.FakeQuantization #8685