alibaba / TinyNeuralNetwork

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
MIT License
738 stars 117 forks source link

Model with "concatenate" layer does not export correctly with int8 #354

Closed spacycoder closed 3 weeks ago

spacycoder commented 3 weeks ago

Hi, I'm having some issues converting a model when using "int8" as the target type. This is the error I get when run the model with tensorflow after conversion:

    _main()
  File ".../test_concat_inference.py", line 9, in _main
    interpreter.allocate_tensors()
  File ".../lib/python3.11/site-packages/tflite_runtime/interpreter.py", line 531, in allocate_tensors
    return self._interpreter.AllocateTensors()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: /tensorflow/tensorflow/lite/kernels/concatenation.cc:183 t->params.scale != output->params.scale (0 != -1875126784)Node number 0 (CONCATENATION) failed to prepare.

I can reproduce the issue with this code:

import torch.nn as nn
import torch
from tinynn.graph.quantization.quantizer import PostQuantizer
from tinynn.converter import TFLiteConverter

class ConcatModel(nn.Module):

    def forward(self, x0, x1):
        return torch.cat([x0, x1], dim=1)

def _main():
    dummy_input0 = torch.rand(1, 3, 224, 224)
    dummy_input1 = torch.rand(1, 3, 224, 224)

    model = ConcatModel()

    qat_config = {"backend": "qnnpack"}
    quantizer = PostQuantizer(
        model, (dummy_input0, dummy_input1), work_dir="concat_model", config=qat_config
    )

    ptq_coarse_matcher = quantizer.quantize()
    ptq_coarse_matcher(dummy_input0, dummy_input1)

    with torch.no_grad():
        ptq_coarse_matcher.eval()
        ptq_coarse_matcher.cpu()

        ptq_coarse_matcher = quantizer.convert(ptq_coarse_matcher)
        torch.backends.quantized.engine = quantizer.backend
        converter = TFLiteConverter(
            ptq_coarse_matcher,
            (dummy_input0, dummy_input1),
            "concat_model.tflite",
            fuse_quant_dequant=True,
            quantize_target_type="int8"
        )
        converter.convert()

if __name__ == '__main__':
    _main()

and running the model like this:

    interpreter = tflite.Interpreter(model_path="concat_model.tflite")
    interpreter.allocate_tensors()
peterjc123 commented 3 weeks ago

@spacycoder You'll need an additional configuration item qat_config = {"disable_requantization_for_cat": True} when using quantize_target_type="int8" in TFLiteConverter (when you have qat_config {"per_tensor": True}, which is the default). This one is not documented, but I don't know where to put that.

spacycoder commented 3 weeks ago

Ok, thanks!