Caldarie / flutter_tflite_audio

Audio classification Tflite package for flutter (iOS & Android). Can support Google Teachable Machine models
MIT License
64 stars 25 forks source link

Nil found error with Google Teachable Machine model #5

Open pierremotard opened 3 years ago

pierremotard commented 3 years ago

When I try to run the app with a GTM built model on device it keeps triggering the following error when the buffer size is reached and inference should happen.

Failed to invoke the interpreter with error: Provided data count 376128 must match the required count 176128. Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value: file tflite_audio/SwiftTfliteAudioPlugin.swift, line 283 2021-02-18 10:29:23.396879+0100 Runner[8368:1723178] Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value: file tflite_audio/SwiftTfliteAudioPlugin.swift, line 283

The problematic line is: let scores = [Float32](unsafeData: outputTensor.data) ?? [] where outputTensor is found as nil.

Thanks for helping!

Caldarie commented 3 years ago

Hi @pierremotard,

Thanks for opening a new issue.

Looking at the error, i am gonna suspect that your recordingLength is 94032?

If yes, unfortunately, Google's Teachabable Machine will only accept a fixed value for 44032. That may be the reason for the cause of the crash.

If you need longer recording time, I can suggest lowering your bufferSize. Make sure that it matches recording length and/or is divisible by 2. For example 44032, 22016, 11008, 5504.

#EDIT: Update for 0.1.6 - You can now use any value for bufferSize. You do not need to have a number thats divisible to recording length. For example, a buffer size of 2500 will work with recordingLength of 16000

You can also increase the numOfInferences by more than 1 to increase the recordingLength. However, this will give you multiple inferences.

pierremotard commented 3 years ago

It works with 44032 thank you!

pierremotard commented 3 years ago

Unfortunately I have now quite the same error when I try the custom model here: https://www.tensorflow.org/tutorials/audio/simple_audio

Here are the values for decodedwav models: final String model = 'assets/simple_audio_model.tflite'; final String label = 'assets/simple_labels.txt'; final String inputType = 'decodedWav'; final int sampleRate = 16000; final int recordingLength = 16000; final int bufferSize = 2000; final int numOfInferences = 1;

And the error occuring is Failed to invoke the interpreter with error: Provided data count 64000 must match the required count 63984. Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value: file tflite_audio/SwiftTfliteAudioPlugin.swift, line 339 2021-02-18 20:17:59.097065+0100 Runner[9359:1915425] Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value: file tflite_audio/SwiftTfliteAudioPlugin.swift, line 339

At line let scores = [Float32](unsafeData: outputTensor.data) ?? []

Caldarie commented 3 years ago

To calculate recording length, you need to divide the numbers by 4. (4 represents the number of bytes). For example, 64,000/4 = 16,000 (or in your case... 63,984/4 = 15,996)

pierremotard commented 3 years ago

I tried 15996 for the recording length but had the same issue, should I let the buffer size to 2000?

Caldarie commented 3 years ago

#EDIT: Update for 0.1.6 - You can now use any value for bufferSize. You do not need to have a number thats divisible to recording length. For example, a buffer size of 2500 will work with recordingLength of 16000

For bufferSize, make sure that it’s divisible to recording length.

For example, if the recordingLength is 15996, you can try the values 7998, 5332, 3999, 2666,1333 for bufferSize

With iOS devices, the value for bufferSize don’t have to be divisible. Unfortunately, with android, the values have to be fixed. I may provide an update in the future when I have time

pierremotard commented 3 years ago

Thank you for answering, but it still triggers the same fatal error. I tried 7998 and 3999.

Caldarie commented 3 years ago

Is the error any different when you used tried 15996 for recording length and 7998 for bufferSize?

if yes, can you paste it?

pierremotard commented 3 years ago

Here is the log:

recordingBuffer length: 1999
recordingBuffer length: 3998
recordingBuffer length: 5997
recordingBuffer length: 7996
recordingBuffer length: 9995
recordingBuffer length: 11994
recordingBuffer length: 13993
recordingBuffer length: 15992
recordingBuffer length: 17991
reached threshold
Running model
nil
Failed to invoke the interpreter with error: Invalid tensor index 1, max index is 0.
Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value: file tflite_audio/SwiftTfliteAudioPlugin.swift, line 339
2021-02-19 00:12:17.499644+0100 Runner[20570:243496] Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value: file tflite_audio/SwiftTfliteAudioPlugin.swift, line 339

Still at the same line:let scores = [Float32](unsafeData: outputTensor.data) ?? [] I think the error is not related to the buffer size finally, and doesn't change with it.

Caldarie commented 3 years ago

It looks like your input works, but now I’m beginning to suspect that the output is incorrect.

Can you upload the model on netron and tell me the array type (for e.g. float32) and dimensions (for e.g. [1, 16000]) for input and output like below:

pierremotard commented 3 years ago

My output is

Screenshot 2021-02-19 at 08 40 41
Caldarie commented 3 years ago

Did you download the model? Or did you create it? If it’s not confidential, may I have a the download link or copy of your model?

pierremotard commented 3 years ago

Before even putting my own model I really started with this one https://www.tensorflow.org/tutorials/audio/simple_audio and only added this cell at the end to convert it to tflite

# Convert the model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Save the model.
with open('simple_audio_model.tflite', 'wb') as f:
  f.write(tflite_model)
Caldarie commented 3 years ago

@pierremotard sorry for the late reply.

Hmm, to be honest, I’m unsure why the output is returning null/nil.

The first possibility is that the model doesn’t like the input data, and so is returning a null exception.

The second possibility is that something in your model has changed (for example preprocessing) which made it incompatible with this package.

Unfortunately, I don’t have much time to train the model using the tutorial and your code; however I can quickly troubleshoot the problem if you would be willing to share your model.

Let me know if you can do so.

pierremotard commented 3 years ago

model_and_labels.zip

No problem thanks for your help, here are both the tflite models and the txt labels files.

Caldarie commented 3 years ago

Hi @pierremotard,

I think i may know the reason for the error. As you can see from the picture below, your model only accepts the following dimensions: [1,124,129,1]. These dimensions represent an image of a spectogram.

Screen Shot 2021-02-20 at 23 17 53

Please take a look at the code below. This was taken from the tutorial. As you can see, the input shape for your model is based on a spectrogram .


for spectrogram, _ in spectrogram_ds.take(1):
  input_shape = spectrogram.shape #THIS IS THE CULPRIT!!! INPUT SHAPE IS FROM A SPECTOGRAM

model = models.Sequential([
    layers.Input(shape=input_shape), 
    preprocessing.Resizing(32, 32),  
    norm_layer,
    layers.Conv2D(32, 3, activation='relu'),
    layers.Conv2D(64, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Dropout(0.25),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(num_labels),
])

To fix this problem you can:

  1. Have the first input layer to accept raw audio layers.input(shape=[16000,1], dtype=tf.float32, name=raw_input)
  2. Have second layer accept sample rate. layers.input(shape=[1], dtype=tf.int32, name=sample_rate)
  3. Add a preprocessing layer by converting your two inputs into a spectogram. Im not too sure how to do this on the top of my head, but you can take a look at this article for some reference:

https://archive.is/Y1dqu#selection-4265.5-4273.8

For better representation, you can do this:

  1. First input layer which accepts float32[16000,1] - this represents raw audio
  2. Covert first input layer to an Audiospectogram
  3. Second input layer which accepts int32[1] - this represents sample rate
  4. Use audio spectogram and samplerate to convert to MFCC
  5. Resize MFCC to feed into the model.

Unfortunately, I dont have the luxury in time to go much further than this. But should you come across an answer, It would be greatly appreciated if you can share.

pierremotard commented 3 years ago

Thanks for the details, I will work on that and keep you up-to-date. I only have two questions in the meantime:

Caldarie commented 3 years ago

@pierremotard

good questions. In that case:

  1. if you want a fixed sample rate in your model, you can simply change the input type to rawAudio in the package.

  2. For your python code, Simply adjust the input shape [16000, 1] to the desired amount. For example assuming you want 5 seconds of audio and your sample rate is 16000, your input shape should be [80000, 1]. Please make sure that you follow these parameters when feeding data into your model

pierremotard commented 3 years ago

For 2) my goal is to analyze recordings, but could be from a few seconds to way more, is this flexibility possible? Not only change from a fixed size to another fized size I mean.

Caldarie commented 3 years ago

Hmm, In that case, you can adjust the parameter ‘numOfInferences’ in the package to lengthen your recording. However that will give you multiple inferences.

Depending on the output, you can average out the results from the multiple inferences. This should be done on your app

if you wish to shorten the recording, unfortunately that is not possible as of this moment. However, I can add this feature in the future. Depending on time and If many people request this feature

pierremotard commented 3 years ago

I built my tensorflow model for 2 inputs but still can't find a way to pass it both a BatchDataset and an int when fit it. In parallel I want to try this model with the 'rawAudio' input type, my issue now comes from the interpreter.invoke() that triggers the following error:

TensorFlow Lite Error: tensorflow/lite/kernels/reshape.cc:58 stretch_dim != -1 (0 != -1)
TensorFlow Lite Error: Node number 26 (RESHAPE) failed to prepare.

Failed to invoke the interpreter with error: Must call allocateTensors().
Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value: file tflite_audio/SwiftTfliteAudioPlugin.swift, line 285
2021-02-21 17:41:33.638454+0100 Runner[5502:3779842] Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value: file tflite_audio/SwiftTfliteAudioPlugin.swift, line 285

I have seen that tf-nightly could maybe help, do you have any idea?

Caldarie commented 3 years ago

Try swapping the dimensions around from float32[16000, 1] to float32[1, 16000]

pierremotard commented 3 years ago

My model already has this dimension shape (I used the GTM settings so 44032) Screenshot 2021-02-22 at 06 50 42

It is true though that the Input looks like this in the code: x = Input(shape=(n_samples,), name='input', dtype='float32')

But I only get errors whenever I try to transpose or change that.

Caldarie commented 3 years ago

Hi @pierremotard,

unfortunately, I’m not too knowledgeable with your problem. However, I think you may be missing something crucial like quantization, preprocessing, normalisation, or resizing your tensor input. Again, just an educated guess, as I’m not too familiar with keras.

Apologies if my answer does not satisfy your question. Though, if you do find an answer, it’ll be greatly appreciate if you can share with others.

pierremotard commented 3 years ago

Ok so with a trivial model, only one hidden layer it worked. I will let you know which layer caused the problem once I found it but I think you guessed correctly maybe the normalization caused the issue.

Caldarie commented 3 years ago

Hi @pierremotard,

I would like to check up on your progress. Have you found any solutions yet?

If not, I have found some references which can be used to assist with your problem. You can view them here:

  1. Raw audio.
  2. Decoded wave - i believe this was the original
  3. Detailed tutorial here
pierremotard commented 3 years ago

Hi @Caldarie, thanks for the resources I will try to build upon them, until now I was able to import a model but not one with the Short Time Fourier Transform from TensorFlow (tf.signals.stft) that has some reshaping that causes issues, it is this layer that prevent the model from working. You can check out my issue here https://github.com/tensorflow/tflite-support/issues/352 for more details. I will try the stft from SciPy as well tomorrow.

I now have this error: TensorFlow Lite Error: Regular TensorFlow ops are not supported by this interpreter. Make sure you apply/link the Flex delegate before inference. TensorFlow Lite Error: Node number 25 (FlexRFFT2D) failed to prepare.

Failed to invoke the interpreter with error: Must call allocateTensors(). Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value: file tflite_audio/SwiftTfliteAudioPlugin.swift, line 285 2021-03-01 00:01:57.767260+0100 Runner[54276:2945771] Fatal error: Unexpectedly found nil while implicitly unwrapping an Optional value: file tflite_audio/SwiftTfliteAudioPlugin.swift, line 285

I noticed the first error is one of the listed Known issues that you refer in the README, but even the step 4 and 5 don't help with it.

Caldarie commented 3 years ago

@pierremotard Did you get this error by loading your own trained model? In that case, the problem most likely lies with your model. Please take a look at this issue here

Another likely possibility is that you didn’t configure your settings correctly, please take a look at this Issue#7

You can also find more about this error by google search

pierremotard commented 3 years ago

@Caldarie Here's an update because I was finally able to find a workaround on this issue. Two tensorflow functions were causing issues, tf.signal.stft and tf.abs, so I had to implement the STFT using tf.frame and then a manual FFT, and a manual absolute value of complex values to obtain the magnitude. It looks like all the Flex delegate (Flex.RFFT, Flex.Abs, ...) from tensorflow is not working on tflite.

Caldarie commented 3 years ago

@pierremotard many thanks for the update!!!

Also, I just updated the package so now you don’t need to have a bufferSize that’s divisible to recordingLength.

Caldarie commented 3 years ago

Hi @pierremotard

Theres a possibility that you cannot use the flex delegates as you didn't include it in your conversion.

To convert with flex delegates, you may consider doing:

import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
  tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)

This was taken from the official documents for Tensorflow

pierremotard commented 3 years ago

Unfortunately I already including these supported_ops, but really I've seen no one managing to use these operations (RFFT, STFT etc) in tflite so I think it's not supported yet.

Thanks for the update about the buffer size!

Caldarie commented 3 years ago

Hi @pierremotard,

I may provide a future update where the plugin can use a spectrogram as an input type instead of raw audio or decoded wave.

Though, this may take awhile to develop. I will keep you posted