google-ai-edge / LiteRT

LiteRT is the new name for TensorFlow Lite (TFLite). While the name is new, it's still the same trusted, high-performance runtime for on-device AI, now with an expanded vision.
https://ai.google.dev/edge/litert
Apache License 2.0
168 stars 13 forks source link

Full Integer Quantization Issue with Multiple Signatures in TensorFlow Lite #80

Open gaikwadrahul8 opened 2 days ago

gaikwadrahul8 commented 2 days ago

1. System information

2. Code

To help reproduce this issue, I am providing a link to a custom Colab notebook: Full Integer Quantization Issue with Multiple Signatures in TensorFlow Lite

3. Failure after conversion

In the dynamic range quantization process of TensorFlow Lite, it appears that for models with multiple signatures (including aliased ones), the quantization treats references to the same computational graph as a single entity. This is evidenced by the TFLite ModelAnalyzer report showing two subgraphs with identical sizes, yet the overall model size corresponds roughly to the size of a single subgraph. Specifically:

 TFLite ModelAnalyzer Output of  Dynamic Range Quantization:
              Model size:      56528 bytes
    Non-data buffer size:       3384 bytes (05.99 %)
  Total data buffer size:      53144 bytes (94.01 %)
          - Subgraph#0  :      53040 bytes (93.83 %)
          - Subgraph#1  :      53040 bytes (93.83 %)
    (Zero value buffers):          0 bytes (00.00 %)
    The total model size is not the sum of the two subgraphs, suggesting that the same subgraph is counted twice but only stored once.

However, the situation is markedly different in the full integer quantization process. Here, the quantization leads to two subgraphs with significantly different sizes, which indicates a distinct treatment of the computational graph segments during quantization. This behavior contrasts with the dynamic range quantization and suggests that the full integer quantization process might interpret or handle the aliased signatures differently, resulting in varied optimization or quantization strategies for the subgraphs. The detailed output is as follows:

 TFLite ModelAnalyzer Output of Full Integer Quantization:
              Model size:     259144 bytes
    Non-data buffer size:       4360 bytes (01.68 %)
  Total data buffer size:     254784 bytes (98.32 %)
          - Subgraph#0  :      51120 bytes (19.73 %)
          - Subgraph#1  :     203568 bytes (78.55 %)
    (Zero value buffers):          0 bytes (00.00 %)
    Here, the total model size reflects the sum of two distinctly sized subgraphs, highlighting a genuine differentiation in how each subgraph is quantized and stored.

This discrepancy between dynamic range and full integer quantization processes raises questions about the underlying mechanisms TensorFlow Lite employs for handling multiple signatures, especially when they reference the same computational graph segment. The difference in subgraph sizes under full integer quantization suggests that the process may inadvertently treat aliased signatures or multiple references as distinct computational entities, potentially leading to inefficiencies in model size and performance.

gaikwadrahul8 commented 22 hours ago

This issue originally reported by @lxzheng has been moved to this dedicated repository for LiteRT to enhance issue tracking and prioritization. To ensure continuity, we have created this new issue on your behalf.

We appreciate your understanding and look forward to your continued involvement.