Full Integer Quantization Issue with Multiple Signatures in TensorFlow Lite

1. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 22.04.3 LTS
TensorFlow installation (pip package or built from source):pip
TensorFlow library (version, if pip package or github SHA, if built from source):2.15.0

2. Code

To help reproduce this issue, I am providing a link to a custom Colab notebook: Full Integer Quantization Issue with Multiple Signatures in TensorFlow Lite

3. Failure after conversion

In the dynamic range quantization process of TensorFlow Lite, it appears that for models with multiple signatures (including aliased ones), the quantization treats references to the same computational graph as a single entity. This is evidenced by the TFLite ModelAnalyzer report showing two subgraphs with identical sizes, yet the overall model size corresponds roughly to the size of a single subgraph. Specifically:

 TFLite ModelAnalyzer Output of  Dynamic Range Quantization:

              Model size:      56528 bytes
    Non-data buffer size:       3384 bytes (05.99 %)
  Total data buffer size:      53144 bytes (94.01 %)
          - Subgraph#0  :      53040 bytes (93.83 %)
          - Subgraph#1  :      53040 bytes (93.83 %)
    (Zero value buffers):          0 bytes (00.00 %)

    The total model size is not the sum of the two subgraphs, suggesting that the same subgraph is counted twice but only stored once.

However, the situation is markedly different in the full integer quantization process. Here, the quantization leads to two subgraphs with significantly different sizes, which indicates a distinct treatment of the computational graph segments during quantization. This behavior contrasts with the dynamic range quantization and suggests that the full integer quantization process might interpret or handle the aliased signatures differently, resulting in varied optimization or quantization strategies for the subgraphs. The detailed output is as follows:

 TFLite ModelAnalyzer Output of Full Integer Quantization:

              Model size:     259144 bytes
    Non-data buffer size:       4360 bytes (01.68 %)
  Total data buffer size:     254784 bytes (98.32 %)
          - Subgraph#0  :      51120 bytes (19.73 %)
          - Subgraph#1  :     203568 bytes (78.55 %)
    (Zero value buffers):          0 bytes (00.00 %)

    Here, the total model size reflects the sum of two distinctly sized subgraphs, highlighting a genuine differentiation in how each subgraph is quantized and stored.

This discrepancy between dynamic range and full integer quantization processes raises questions about the underlying mechanisms TensorFlow Lite employs for handling multiple signatures, especially when they reference the same computational graph segment. The difference in subgraph sizes under full integer quantization suggests that the process may inadvertently treat aliased signatures or multiple references as distinct computational entities, potentially leading to inefficiencies in model size and performance.

google-ai-edge / LiteRT