Fully connected bad shape inference, transformers

kylesayrs commented 2 years ago

Hi there. I'm trying to run inference with a BERT model whose batch size is 1 and I run into a shape inference error. I have no problem running the model with XNNPACK

This error does not occur when the model is generated with a batch size of None, but that leads to another error mentioned here.

I know that this issue mentions that layer norm nodes are not supported by ARMNN, however I don't believe this model graph has any nodes that are not supported (or at least this error doesn't make that evident).

Download model (this model is quantized, where as the script is non quantized. Both give the same error): https://drive.google.com/file/d/1kF2nynDj9exSU9Ensg0fsp2NaFzD8s2-/view?usp=sharing

import tensorflow as tf
import tensorflow_hub as hub

batch_size = 1
sequence_length = 32
download_url = "https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-12_H-768_A-12/2"
save_path = "bert_mlm.tflite"

input_ids = tf.keras.layers.Input(
    (sequence_length,),
    batch_size=batch_size,
    dtype=tf.int32,
    name="input_ids",
)
token_type_ids = tf.keras.layers.Input(
    (sequence_length,),
    batch_size=batch_size,
    dtype=tf.int32,
    name="token_type_ids",
)
attention_mask = tf.keras.layers.Input(
    (sequence_length,),
    batch_size=batch_size,
    dtype=tf.int32,
    name="attention_mask",
)
inputs = {
    "input_word_ids": input_ids,
    "input_type_ids": token_type_ids,
    "input_mask": attention_mask,
}

encoder = hub.KerasLayer(download_url, trainable=False)
outputs = encoder(inputs)
model = tf.keras.Model(inputs, outputs)

converter = tf.lite.TFLiteConverter.from_keras_model(model)

tflite_model = converter.convert()

with open(save_path, "wb") as f:
  f.write(tflite_model)

Create inferencer

import tflite_runtime.interpreter

model_path = "gpt2-64.tflite"
delegate_lib_path = "~/ArmNN-linux-aarch64/libarmnnDelegate.so"

delegate = tflite_runtime.interpreter.load_delegate(
                    library=delegate_lib_path,
                    options={"backends": "CpuAcc,CpuRef", "logging-severity": "info"},
                )
interpreter = tflite_runtime.interpreter.Interpreter(
            model_path=model_path, experimental_delegates=self._delegates,
        )

Shape inference error

Info: ArmNN v29.0.0                                                                                                                                                                  
Info: Initialization time: 0.94 ms.                                                                                                                                                  
INFO: TfLiteArmnnDelegate: Created TfLite ArmNN delegate.                                                                                                                            
Info: ArmnnSubgraph creation                                                                                                                                                         
Info: Parse nodes to ArmNN time: 133.79 ms                                                                                                                                           
Info: Optimize ArmnnSubgraph time: 0.54 ms                                                                                                                                           
Info: Load ArmnnSubgraph time: 77.01 ms                                                                                                                                              
Info: Overall ArmnnSubgraph creation time: 211.57 ms                                                                                                                                 

Info: ArmnnSubgraph creation                                                                                                                                                         
Info: Parse nodes to ArmNN time: 0.03 ms                                                                                                                                             
Info: Optimize ArmnnSubgraph time: 0.25 ms                                                                                                                                           
Info: Load ArmnnSubgraph time: 0.06 ms                                                                                                                                               
Info: Overall ArmnnSubgraph creation time: 0.47 ms                                                                                                                                   

Info: ArmnnSubgraph creation
Info: Parse nodes to ArmNN time: 9.97 ms                                                                                                                                             Info: Optimize ArmnnSubgraph time: 2.65 ms                                                                                                                                           Info: Load ArmnnSubgraph time: 8.65 ms                                                                                                                                               Info: Overall ArmnnSubgraph creation time: 21.49 ms                                                                                                                                                                                                                                                                                                                       Info: ArmnnSubgraph creation                                                                                                                                                         Info: Parse nodes to ArmNN time: 13.46 ms

Traceback (most recent call last):
        interpreter = tflite_runtime.interpreter.Interpreter(
    File "/home/ubuntu/arm-competitive-benchmarking/env/lib/python3.8/site-packages/tflite_runtime/interpreter.py", line 495, in __init__ 
        self._interpreter.ModifyGraphWithDelegate(
RuntimeError: TfLiteArmnnDelegate: Exception (FullyConnectedLayer: TensorShape set on OutputSlot[0] does not 
match the inferred shape. : [1,32,3072] != [32,3072]) caught from optimize.
Info: Shutdown time: 22.43 ms.

ArmRyan commented 2 years ago

Hi @kylesayrs ,

Could you try the following option?

     *    Option key: "allow-expanded-dims" \n
     *    Possible values: ["true"/"false"] \n
     *    Description: If true will disregard dimensions with a size of 1 when validating tensor shapes but tensor
     *                 sizes must still match. \n
     *                 This is an Experimental parameter that is incompatible with "infer-output-shape". \n
     *                 This parameter may be removed in a later update.

keidav01 commented 2 years ago

Hi @kylesayrs, did you try @ArmRyan's recommendation? Please feedback and let me know if you need further assistance. I will close this issue in the coming days otherwise. Thank you!

keidav01 commented 1 year ago

Hi @kylesayrs

I will close this on October 10th if I do not hear from you, please let me know if you need any more help.

Thank you, Keith

ARM-software / armnn

Fully connected bad shape inference, transformers #687