bytedance / MRECG

Apache License 2.0
35 stars 4 forks source link

Why loss function value is too high? Is it expecteted result? #6

Open padeirocarlos opened 9 months ago

padeirocarlos commented 9 months ago

I tried running your code for with a pre-trained ResNet50 and MobilieNetV2 model. I got loss function value for output and pred losses:

rec_loss = lp_loss(pred, tgt, p=self.p) :param pred: output from quantized model :param tgt: output from FP model :return: total loss function https://github.com/bytedance/MRECG/blob/main/MRECG.py#L215C10-L225

for layer in self.subgraph.modules():
            if isinstance(layer, _ADAROUND_SUPPORT_TYPE):
                round_vals = layer.weight_fake_quant.rectified_sigmoid()
                round_loss += self.weight * (1 - ((round_vals - .5).abs() * 2).pow(b)).sum()

    :param pred: output from quantized model
    :param tgt: output from FP model
    :return: total loss function

https://github.com/bytedance/MRECG/blob/main/MRECG.py#L215C10-L225

Are there additional settings I missed?

BobxmuMa commented 8 months ago

When optimizing the reconstruction module, the rounding loss is large at the beginning and gradually converges to 0 as the optimization process proceeds.

Furthermore, as the module gets deeper, the final value of the rounding loss gradually increases. If the problem is not resolved, you can provide a specific loss distribution.

padeirocarlos commented 8 months ago

Sorry! For this question! I did not follow clearly this statement.

"Furthermore, as the module gets deeper, the final value of the rounding loss gradually increases. If the problem is not resolved, you can provide a specific loss distribution"

Could you elaborate more please?! What do you mean here?

BobxmuMa commented 8 months ago

Sorry! For this question! I did not follow clearly this statement.

"Furthermore, as the module gets deeper, the final value of the rounding loss gradually increases. If the problem is not resolved, you can provide a specific loss distribution"

Could you elaborate more please?! What do you mean here?

Since the model is optimized by blockwise reconstruction, each block of the model has an optimization process and a convergence loss. In the preceding blocks of the model, the convergence loss of the block optimization tends to be 0. In the deeper blocks of the model, the optimized convergence loss remains at a larger value. At low-bit quantization, the value of the convergence loss in the deeper blocks is even larger. Such a phenomenon is normal in the ImageNet dataset.

padeirocarlos commented 8 months ago

That is true. Also I noted that in my dataset and ImageNet! I think one of the reason is the quantization accumulation error which become more bigger in deeper blocks or layer! So providing a specific loss distribution can solve this issues?! Have tried that? I will try it!