Audio classification will produce different results on different runs

tabatinga0xffff commented 5 months ago

Environment (please complete the following information):

OS/OS Version: Windows 10
Unity Version: 22.3.3f1

 "com.github.asus4.tflite": "2.15.0",
 "com.github.asus4.tflite.common": "2.15.0",
 "com.github.asus4.mediapipe": "2.15.0"

Code

//model:
// input shape:     [1, 40]
// output shape:    [1, 1]

var inputSize = 40;

var bs = File.ReadAllBytes(Path.Combine(Application.dataPath, "Models", "m1.tflite"););
var options = new InterpreterOptions {
    threads = 2,
    useNNAPI = false,
};

double[,] mfccsTransposedArray = calculateMFCCAndTransform("file1.wav");

using (var interpreter = new Interpreter(bs, options)) {
     interpreter.AllocateTensors();
     interpreter.ResizeInputTensor(0, new int[] { 1, inputSize });

     try {
        //this will ALWAYS fail - "TensorFlow Lite operation failed",
        // hence "try"

        interpreter.SetInputTensorData(0, mfccsTransposedArray);
     } catch (Exception e) {
         //Debug.Log($"error; step: {i}");
     }

     interpreter.Invoke();

     var outputData = new float[1];
     interpreter.GetOutputTensorData(0, outputData);
     string recognizedSound = InterpretResults(outputData);
     Debug.Log(recognizedSound);
}

When run multiple times, that is, run a scene in Unity editor and then stop, it'll produce different results: 1, 0.98, 0.48, sometimes even 0. With the same audio file.

What's the matter?

And why will this

interpreter.SetInputTensorData(0, mfccsTransposedArray);

always fail?

asus4 commented 5 months ago

@tabatinga0x00 Is it possible to share the TFLite model and the reproducible project?

In most cases, the exception at SetInputTensorData is caused by mismatching the data type (float, double, or short) or array length between the model and the C# code.

asus4 commented 5 months ago

According to the tensorflow API docs,

The initializing order might be like this.

// first call resize input tensor
interpreter.ResizeInputTensor(0, new int[] { 1, inputSize });
// then allocate it.
interpreter.AllocateTensors();

tabatinga0xffff commented 5 months ago

@asus4 I'll share it.

// first call resize input tensor
interpreter.ResizeInputTensor(0, new int[] { 1, inputSize });
// then allocate it.
interpreter.AllocateTensors();

it hasn't worked

tabatinga0xffff commented 5 months ago

@asus4 have you seen my email? Should I post the code here?

asus4 commented 5 months ago

@tabatinga0x00 Please understand that I don't provide personal support by email. However, if you create a public PR (Pull Request) or a reproducible repository, I can take a look at it.

asus4 commented 3 months ago

Hi @tabatinga0xffff

I ported a YAMNet-based audio classification example form TensorFlow Lite examples in this PR:

348

The model architecture differs from your model but might help to solve your issue.

Screenshot 2024-03-20 at 14 38 26

tabatinga0xffff commented 3 months ago

@asus4

I copied the directory AudioClassification into Assets updated manifest.json according to README

And arised this error:

Assets\AudioClassification\AudioClassification.cs(98,24): error CS1061: 'NativeSlice<AudioClassification.Label>' does not contain a definition for 'Sort' and no accessible extension method 'Sort' accepting a first argument of type 'NativeSlice<AudioClassification.Label>' could be found (are you missing a using directive or an assembly reference?)

tabatinga0xffff commented 3 months ago

My model, if you'd need it

Models.zip

asus4 commented 2 months ago

@tabatinga0xffff Your custom model is not related to this kind of error. Did you install "com.unity.collections" ?

asus4 / tf-lite-unity-sample

Audio classification will produce different results on different runs #333

348