ARM-software / armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
https://developer.arm.com/products/processors/machine-learning/arm-nn
MIT License
1.14k stars 307 forks source link

Whisper tflite doesn't work #773

Closed federicoparra closed 2 weeks ago

federicoparra commented 1 month ago

I translated OpenAI whisper base model to tflite to use with armnn.

It works fine without ARMNN but when we use the armnn delegate it creates error.

I want to assert here that the converted tflite model I share here, which is based on https://huggingface.co/openai/whisper-base, was shared here under license CC BY-NC-SA 4.0

Model: https://1drv.ms/u/s!AkAYIvBr_1Bhx8QGR-SOryZ24rjmLA?e=KFzKj6

To run use this code:

First, without armnn delegate (works fine):

pip install tensorflow==2.10.0 pip install transformers==4.41.0 pip install datasets pip install tflite_runtime

import tensorflow as tf

from datasets import load_dataset from transformers import WhisperProcessor, WhisperFeatureExtractor, TFWhisperForConditionalGeneration, WhisperTokenizer

feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-base.en") tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-base.en", predict_timestamps=True) processor = WhisperProcessor(feature_extractor, tokenizer) model = TFWhisperForConditionalGeneration.from_pretrained("openai/whisper-base.en")

Loading dataset

ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")

inputs = feature_extractor( ds[0]["audio"]["array"], sampling_rate=ds[0]["audio"]["sampling_rate"], return_tensors="tf" ) input_features = inputs.input_features

import tflite_runtime.interpreter as tflite

tflite_model_path = 'whisper-base.tflite'

interpreter = tflite.Interpreter(tflite_model_path)

tflite_generate = interpreter.get_signature_runner() generated_ids = tflite_generate(input_features=input_features)["sequences"] transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] transcription

That works. Now the same with ARMNN delegate does not work:

import tflite_runtime.interpreter as tflite

armnn_delegate = tflite.load_delegate( library="/home/federico/Documents/code/ARM/aarch64_build/delegate/libarmnnDelegate.so", options={"backends": "GpuAcc,CpuAcc", "logging-severity":"trace"})

loaded model... now with generate!

tflite_model_path = 'whisper-base.tflite'

interpreter = tflite.Interpreter(tflite_model_path, experimental_delegates=[armnn_delegate])

tflite_generate = interpreter.get_signature_runner() generated_ids = tflite_generate(input_features=input_features)["sequences"] transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] transcription

Following error:

Failed to load TFLite model: INFO: TfLiteArmnnDelegate: Added backend GpuAcc INFO: TfLiteArmnnDelegate: Added backend CpuAcc

RuntimeError Traceback (most recent call last) Cell In[53], line 15 13 # interpreter = tflite.Interpreter(tflite_model_path, experimental_delegates=[armnn_delegate]) 14 try: ---> 15 interpreter = tflite.Interpreter(model_path=tflite_model_path, experimental_delegates=[armnn_delegate]) 16 print("TFLite model loaded successfully.") 17 except Exception as e:

File ~/miniconda3/envs/captioning/lib/python3.9/site-packages/tflite_runtime/interpreter.py:513, in Interpreter.init(self, model_path, model_content, experimental_delegates, num_threads, experimental_op_resolver_type, experimental_preserve_all_tensors, experimental_disable_delegate_clustering) 511 self._delegates = experimental_delegates 512 for delegate in self._delegates: --> 513 self._interpreter.ModifyGraphWithDelegate( 514 delegate._get_native_delegate_pointer()) # pylint: disable=protected-access 515 self._signature_defs = self.get_signature_list() 517 self._metrics = metrics.TFLiteMetrics()

RuntimeError:

federicoparra commented 1 month ago

@Colm-in-Arm ? @catcor01 ? any ideas?

Colm-in-Arm commented 3 weeks ago

Hello Federico,

Sorry for the delay. We were busy getting 24.05 delivered.

I'm surprised you're not getting more detailed error messages. When I called ModifyGraphWithDelegate with that model I got a "kTfLiteApplicationError." result. This doesn't come from Arm NN but from the TfLite runtime itself. TfLite documentation says:

kTfLiteApplicationError : Delegation failed to be applied due to the incompatibility with the TfLite runtime, e.g., the model graph is already immutable when applying the delegate. However, the interpreter could still be invoked.

Examining the model in Netron shows a pretty monstrous WHILE operator at the end. Arm NN does not support conditionals and TfLite insists that an entire conditional subgraph must be delegated otherwise it will not allow delegation.

In summary this model cannot be run on Arm NN.

Colm.

federicoparra commented 2 weeks ago

With all due respect @Colm-in-Arm , that's not acceptable :(

We are talking about Whisper! You feel comfortable saying that maybe one of the most important models around can't be supported by what you claim is supposed to be the fastest/best supported way to run neural networks on ARM?

It has trouble with LLMs, it can't do Whisper, so what can it do? Object détection? Is that what this library is meant to do?

In the meantime, MLC-LLM with it's compilation techniques achieves quite fast generative inference on the Mali GPU.

I'm sorry but like me, many will discard ARMNN when they realize it doesn't support any of the modern networks we need to run these days :(

MatthewARM commented 2 weeks ago

I did a bit of digging, it seems the kTfLiteApplicationError is being raised at subgraph.cc:2255 in Tensorflow Lite, which is checking if the delegate supports dynamically shaped tensors, so the error is the same as the warning that is also raised "WARNING: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors (tensor#1936 is a dynamic-sized tensor)."

For what it's worth, it looks like the built-in XNNPACK delegate also hits that path when Lite tries to apply it by default, and then at interpreter.cc:381 in Tensorflow Lite that error is ignored and inference proceeds using the default kernels without delegation.

I haven't checked what happens with the built-in Lite GPU delegate.

So this isn't really an Arm NN issue as such, this model is hitting the cutting-edge of what Tensorflow Lite can do in the default path and Tensorflow Lite doesn't seem ready to handle delegation of this model yet.