Counting specific sound occurances in the audio

nazdream commented 3 years ago

I am trying to count the number of specific sound occurrences in the audio

The problem I have is that I am calling TfliteAudio.startAudioRecognition and trying to listen to the events steam I am receiving events every 1 second. And I can't find the possibility to increase events' frequency to receive events every 50-100 ms. Is it possible to decrease interval duration to 50-100 ms?

Another problem I have is that event['recognitionResult'] always returns "1 Result": result: {hasPermission=true, inferenceTime=75, recognitionResult=1 Result} However, there are more than 1 repetitions of sound I am trying to count in each 1-second interval. Should it work like this and what does number "1" means, is this number of the sound in a single audio interval or something else?

Is it possible to implement specific sound counting with this package or I should look somewhere else? Any feedback would be helpful, thanks!

Caldarie commented 3 years ago

Hi @nazdream,

Looking at your description, it looks like you’re trying to build Sound Event Detection model. Correct me if I’m wrong here.

As for “ 1 Result”, i can check what’s wrong if you’re willing to share your label text file. Let me know if this is possible .

nazdream commented 3 years ago

Sure, 5 minutes please, I can upload the label and ts model to google drive and share access to it to this gmail: michaeltamthiennguyen@gmail.com. Will you be able to access it from there?

Caldarie commented 3 years ago

@nazdream no problems. It should be fine

nazdream commented 3 years ago

I have sent you the invite, can you check if you received it, please?

nazdream commented 3 years ago

Here is the code I used for testing the functionality I need:

import 'package:flutter/material.dart';
import 'package:tflite_audio/tflite_audio.dart';

class TestPage extends StatefulWidget {

  @override
  _TestPageState createState() => _TestPageState();
}

class _TestPageState extends State<TestPage> {
  bool _recording = false;
  int _results = 0;
  int _events = 0;
  Stream<Map<dynamic, dynamic>> _result;

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      body: ListView(
        children: [
          const SizedBox(height: 30),
          Center(
            child: Text('Audio'),
          ),
          const SizedBox(height: 30),
          Center(
            child: Container(
              width: 100,
              height: 100,
              decoration: BoxDecoration(
                borderRadius: BorderRadius.circular(100),
                color: _recording ? Colors.red : Colors.blue,
              ),
              child: Center(
                child: Text(_recording ? 'Recording...' : 'Idle'),
              ),
            ),
          ),
          const SizedBox(height: 30),
          Center(
            child: Text('Results: $_results'),
          ),
          Center(
            child: Text('Events: $_events'),
          ),
          const SizedBox(height: 30),
          Padding(
            padding: const EdgeInsets.symmetric(horizontal: 20),
            child: RaisedButton(
              onPressed: _recording ? _stop : _recorder,
              child: Text(_recording ? 'Stop' : 'Record'),
            ),
          ),
        ],
      ),
    );
  }

  void _recorder() {
    if (!_recording) {
      setState(() {
        _recording = true;
        _results = 0;
        _events = 0;
      });

      _result = TfliteAudio.startAudioRecognition(
        numOfInferences: 100,
        inputType: 'rawAudio',
        sampleRate: 44100,
        recordingLength: 44032,
        bufferSize: 22050,
        averageWindowDuration: 10,
        detectionThreshold: 0.6,
        suppressionTime: 10,
        minimumTimeBetweenSamples: 10,

      );

      _result.listen((event) {
        setState(() {
          _events ++;
        });
        if (event['recognitionResult'] == '1 Punch') {
          setState(() {
            _results ++;
          });
        }
      }).onDone(() {
        setState(() {
          _recording = false;
        });
      });
    }
  }

  void _stop() {
    TfliteAudio.stopAudioRecognition();
  }
}

And in the app file I am initializing the model:

void _loadTFModel() async {
    String result = await TfliteAudio.loadModel(
      label: 'assets/ml/labels.txt',
      model: 'assets/ml/soundclassifier.tflite',
      numThreads: 2,
      isAsset: true,
    );
  }

Caldarie commented 3 years ago

Ah, it seems you are using Google's Teachable Machine.

So, just to clarify a few more things, you want to reduce the recording length from 1000ms to around 50-100ms? Is that correct?

nazdream commented 3 years ago

Yes

nazdream commented 3 years ago

Or another option is to see how many occurrences of specific sound are in a 1-second interval. If startAudioRecognition event included a number of occurrences of sound in the 1-second interval that also would be perfect. Something like this: {hasPermission: true, inferenceTime: 71, recognitionResult: 0 Name, occurrences: 5}

Caldarie commented 3 years ago

Yes

In that case, i think it may possible to do so. However, I have yet to test whether it works.

If you don't mind doing the testing for me, I suggest reducing the recordingLength to perhaps half, quarter or one eigth.

Let me know how it goes.

Or another option is to see how many occurrences of specific sound are in a 1-second interval. If startAudioRecognition event included a number of occurrences of sound in the 1-second interval that also would be perfect. Something like this: {hasPermission: true, inferenceTime: 71, recognitionResult: 0 Name, occurrences: 5}

This may be possible, but i may need to change the source code around a bit to achieve this effect.

nazdream commented 3 years ago

If you don't mind doing the testing for me, I suggest reducing the recordingLength to perhaps half, quarter or one eight.

I will test this in a moment and provide results here

nazdream commented 3 years ago

I have tried reducing recordingLength by 2, 4 and 8 times, but the app is crashing every time with the following error:

E/AndroidRuntime(25618): FATAL EXCEPTION: Thread-7 E/AndroidRuntime(25618): Process: , PID: 25618 E/AndroidRuntime(25618): java.lang.IllegalArgumentException: Internal error: Failed to run on the given Interpreter: tensorflow/lite/core/subgraph.cc BytesRequired number of elements overflowed. E/AndroidRuntime(25618): E/AndroidRuntime(25618): Node number 42 (MAX_POOL_2D) failed to prepare. E/AndroidRuntime(25618): E/AndroidRuntime(25618): at org.tensorflow.lite.NativeInterpreterWrapper.run(Native Method) E/AndroidRuntime(25618): at org.tensorflow.lite.NativeInterpreterWrapper.run(NativeInterpreterWrapper.java:204) E/AndroidRuntime(25618): at org.tensorflow.lite.Interpreter.runForMultipleInputsOutputs(Interpreter.java:374) E/AndroidRuntime(25618): at org.tensorflow.lite.Interpreter.run(Interpreter.java:332) E/AndroidRuntime(25618): at flutter.tflite_audio.TfliteAudioPlugin.rawAudioRecognize(TfliteAudioPlugin.java:508) E/AndroidRuntime(25618): at flutter.tflite_audio.TfliteAudioPlugin.access$300(TfliteAudioPlugin.java:54) E/AndroidRuntime(25618): at flutter.tflite_audio.TfliteAudioPlugin$4.run(TfliteAudioPlugin.java:452) E/AndroidRuntime(25618): at java.lang.Thread.run(Thread.java:919)

I am using Xiaomi Redmi 7 for testing

Caldarie commented 3 years ago

Can you tell me your bufferSize? You need to match it or have it lower than the recording length.

nazdream commented 3 years ago

I was using 8000 buffer size

_result = TfliteAudio.startAudioRecognition(
        numOfInferences: 100,
        inputType: 'rawAudio',
        sampleRate: 44100,
        recordingLength: 22016,
        bufferSize: 2000,
        detectionThreshold: 0.6,
      );

nazdream commented 3 years ago

Just tried to use some other combinations of recordingLength and bufferSize but the app keeps crashing when I start recording audio with recordingLength != 44032. Let me know if you have any ideas on what could be the problem, please

Caldarie commented 3 years ago

Hi @nazdream,

I will take a look into the source code when i find some spare time. I cannot guarantee a fix, but i will keep you updated.

nazdream commented 3 years ago

Thanks!

Caldarie commented 3 years ago

Hi @nazdream,

After running some tests, i think an inference every 50-100ms (or 10 to 20 inferences per recording) will be extremely taxing for mobile devices. The best i can achieve is around 200ms for each inference, and this excludes latency and recording delays. Running an inference 10 to 20 times concurrently will cause a noticeable lag and can deteriorate the user's experience, i think.

What i can do however is increase the sensitivity of bufferSize so that the delay for each inference can be minimized; however, this approach will not have much impact either.

Let me know your thoughts about this.

nazdream commented 3 years ago

Hi @Caldarie ! Thanks for the response. I think that 200ms will not work for me however I can try, maybe that will be enough. I ideally need some solution to be able to detect sound around every 100 ms or less. Is it technically possible to achieve something like this or typical device doesn't have enough resources to achieve such high frequency?

Caldarie commented 3 years ago

@nazdream I think it's very difficult to achieve your requirements with GTM models considering its a simple audio classification model.

However, i think it's possible to detect sound every 100ms if you built your own custom model. Though, this may require deep knowledge of machine learning. You will also need to modify this package as well to fit this custom model.

nazdream commented 3 years ago

How can inference every 50-100ms impact the device? I just need this sound detecting feature to work 30 seconds in a row max, so maybe if that will not impact the device that much.

Can you clarify if I understood you correctly that it's the GTM model which makes processing slow/ power consuming and the custom model can be much faster? Because we can create a custom GTM model if needed, however, I want to understand if this 200ms is predefined in the package code or it depends on the model and device benchmark and is unique for every combination of device and tf model?

Caldarie commented 3 years ago

As I've already explained in comment 13, it's a matter of processing power.

As for building a custom model, on second thought, it may also not be possible depending on what you want. Do you want real time results (inference every 100ms?), or do you want 20 results to appear after the recording is finished?

nazdream commented 3 years ago

I need real-time results

Caldarie commented 3 years ago

Ah, in that case, I think real time results for every 100ms may be next to impossible; even with custom models.

I may be wrong however. You might want to consult the developers for tflite to see if its possible.

nazdream commented 3 years ago

I will write them a letter and provide the answer from them when they reply.

@Caldarie Is it possible to detect the number of sounds in 1-second intervals, or it is only possible to count the number of sounds by splitting intervals into smaller intervals and check if there is a specific sound in each of them?

Or another option is to see how many occurrences of specific sound are in a 1-second interval. If startAudioRecognition event included a number of occurrences of sound in the 1-second interval that also would be perfect. Something like this: {hasPermission: true, inferenceTime: 71, recognitionResult: 0 Name, occurrences: 5}

Caldarie commented 3 years ago

Apologies for the late reply.

Is it possible to detect the number of sounds in 1-second intervals

Yes, its possible if you develop your own model. With GTM, its impossible considering it only outputs one result at a time.

it is only possible to count the number of sounds by splitting intervals into smaller intervals and check if there is a specific sound in each of them?

As explained before, this approach is possible but not advisable.

nazdream commented 3 years ago

Do I understand that the model output is defined by the model itself and it doesn't depend on the package?

{hasPermission=true, inferenceTime=75, recognitionResult=1 Result}

Caldarie commented 3 years ago

@nazdream That is correct. You want train a model with multi label classification. Just be aware that this will only output multiple labels, not number of occurrences.

If you want to count the number of occurrences, you'll need to go deeper and train for a Sound Event Detection model. This one requires much deeper knowledge and time to train.

As for the package, I have yet to adapt it for models with multiple outputs. However, I am more than happy to adapt it for you; if you're willing to share your model.

Caldarie / flutter_tflite_audio

Counting specific sound occurances in the audio #13