Change recording length using GTM models to allow audio inputs greater than 1 second

Caldarie / flutter_tflite_audio

Audio classification Tflite package for flutter (iOS & Android). Can support Google Teachable Machine models

MIT License

63 stars 24 forks source link

Change recording length using GTM models to allow audio inputs greater than 1 second #8

Open cmalbuquerque opened 3 years ago

cmalbuquerque commented 3 years ago

I am using a GTM model and I am trying to increase the recording length passed to the model. To analysing 1 second of audio, I am using the following configurations:

      numOfInferences: 1,
      inputType: 'rawAudio',
      sampleRate: 44100,
      recordingLength: 44032,
      bufferSize: 22016,

Instead of analysing just 1 second, I want to increase the audio input to 3-5 seconds. Changing the recording length to 132 096 (3 x 44032), sampleRate to 132 300 (3 x 44100) and the bufferSize to half of recordingLength value, the inference crashes.

Is there anyway to record and send to the model an audio with more seconds knowing that GTM model's input requires a tensor input with 44032 size?

Caldarie commented 3 years ago

Hi @cmalbuquerque

Unforunately, the recording length needs to be a fixed size. The good news however is that you can lengthen your recording by reducing your buffer size. For example:

For a very long recording time, try recordingLength of 44032 and a bufferSize of 2000.
For a moderate recoding time, try recordingLength of 44032 and a bufferSize of 8000
For a very short recording time, try recordingLength of 44032 and a bufferSize of 22050

You may want to experiment on different bufferSizes to get the length you want.

Just be aware that it is difficult to get the exact seconds, as the recording times may differ from device to device. Also if you stretch the bufferSize to a very small value, it may adversely influence your inference accuracy.

cmalbuquerque commented 3 years ago

@Caldarie nice, thanks!

I thought that I could only set the bufferSize value as half of recordingLength value... I decreased sample rate to 16kHz and set bufferSize to 2000 and I got approximately 3 seconds of audio... 44,1 kHz improve the accuracy however I believe if I use good audio samples to build the model with very distinct classes and train it, it will be able to get accurate inferences.

Thanks again! 😁

Caldarie commented 3 years ago

@cmalbuquerque Glad I could be of assistance.

You’re absolutely correct. It won’t really matter too much if you have distinct classes. Furthermore, if you listen closely, there’s not much of a difference between a sample rate of 16khz and 44.1khz.

nazdream commented 3 years ago

@Caldarie Hi! I am trying to count the number of specific sounds in the audio stream.

1) The problem I have is that I am calling TfliteAudio.startAudioRecognition and trying to listen to the events steam I am receiving events every 1 second. And I can't find the possibility to increase events' frequency to receive events every 50-100 ms. Is it possible to decrease interval duration to 50-100 ms?

2) Another problem I have is that event['recognitionResult'] always returns "1 Result": result: {hasPermission=true, inferenceTime=75, recognitionResult=1 Result} However, there are more than 1 repetitions of sound I am trying to count in each interval. Should it work like this and what does number "1" means, is this number of the sound in a single audio interval or something else?

Is it possible to implement specific sound counting with this package or I should look somewhere else? Any feedback would be helpful, thanks!