Caldarie / flutter_tflite_audio

Audio classification Tflite package for flutter (iOS & Android). Can support Google Teachable Machine models
MIT License
63 stars 24 forks source link

Increase Inference Frequency (Android) #48

Open bdytx5 opened 1 year ago

bdytx5 commented 1 year ago

Hey man, thanks for this awesome plugin. I was looking to increase the number of times per second I run my model. I was able to implement a sliding window in the swift code, but I'm not super familiar with Java or Android development. I was wondering if you could provide some suggestions on how to accomplish this?

Thanks, Brett

Caldarie commented 1 year ago

HI Brett,

Would it be possible to share the swift code for sliding window? Perhaps I can provide a suggestion once I get an idea how it is implemented.

Michael

bdytx5 commented 1 year ago

Here is the code that implements a sliding window to read to store audio.

`

func startMicrophone(){
    print("start microphone")

    let recordingFrameBuffer = bufferSize/2
    var recordingBuffer: [Int16] = []
    var inferenceCount: Int = 1
    let numOfInferences = self.numOfInferences
    let inputSize = self.inputSize
    let recordingFormat = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: Double(16000), channels: 1, interleaved: true)

    let inputNode = audioEngine.inputNode
    let inputFormat = inputNode.outputFormat(forBus: 0)
    guard let formatConverter =  AVAudioConverter(from:inputFormat, to: recordingFormat!) else {
        return
    }
        var window = [Int16]()
        var pcmWindow = [AVAudioPCMBuffer]()
    //remove existing tap if any

    do {
        try audioEngine.inputNode.removeTap(onBus: 0)
    }
    catch {
        print("no tap to remove")
    }

    // install a tap on the audio engine and loops the frames into recordingBuffer
    audioEngine.inputNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(bufferSize), format: inputFormat) { (buffer, time) in

        self.conversionQueue.async {
            let pcmBuffer = AVAudioPCMBuffer(pcmFormat: recordingFormat!, frameCapacity: AVAudioFrameCount(self.bufferSize))
            var error: NSError? = nil
            let inputBlock: AVAudioConverterInputBlock = {inNumPackets, outStatus in
                outStatus.pointee = AVAudioConverterInputStatus.haveData
                return buffer
            }
            formatConverter.convert(to: pcmBuffer!, error: &error, withInputFrom: inputBlock)
            if error != nil {
                print(error!.localizedDescription)
            }

            else if let channelData = pcmBuffer!.int16ChannelData {

                let channelDataValue = channelData.pointee
                let channelDataValueArray = stride(from: 0, to: Int(pcmBuffer!.frameLength), by: buffer.stride).map{ channelDataValue[$0] }

                            if(window.count < 16000){
                                    window.append(contentsOf: channelDataValueArray)
                                    pcmWindow.append(buffer)
                                }else{

                                    window.append(contentsOf: channelDataValueArray)
                                    pcmWindow.append(pcmBuffer!)

                                    window.removeFirst(1600)
                                    pcmWindow.removeFirst()

                                    //// RUN INFERENCE
                                  self.recognize(onBuffer:window, pcms: pcmWindow)

                                }

             } //channeldata
        } //conversion queue
    } //installtap

    audioEngine.prepare()

    do {
        try audioEngine.start()
    }
    catch {
        print(error.localizedDescription)
    }

}

` Also, here is so kotlin code I found that I modified to run inference at a faster frequency

https://www.tensorflow.org/lite/android/tutorials/audio_classification

by modifying the "interval"

executor = ScheduledThreadPoolExecutor(1) executor.scheduleAtFixedRate( classifyRunnable, 0, interval, TimeUnit.MILLISECONDS)

code in the tutorial, I was able to increase inference speed without implementing a sliding window buffer, however, I was not able to figure out how to do this in the Plugin code

Caldarie commented 1 year ago

Oh I see how it is implemented.

Though compared with Swift/Kotlin, manipulating arrays in Java is such a pain. :laughing: To achieve window sliding, simply replace the following below with your own code. For example:

Modify the public void splice() function from the code here.

For example:


    public void splice(){

        if (record.getState() != AudioRecord.STATE_INITIALIZED) {
            Log.e(LOG_TAG, "Audio Record can't initialize!");
            return;
        }

        while (shouldContinue) {

            short[] shortData = new short [bufferSize];
            record.read(shortData, 0, shortData.length);
            recordingBufferLock.lock();

            try {
                //Add window spliding function here
                //When array fills up, call the function `recordingData.emit(<ADD ARRAY HERE>);` to emit the data for recognition
               //Clear out array once it’s been emitted, and repeat window sliding
               //If you wish stop the recording, call the `stop()` function

            } finally {
                recordingBufferLock.unlock();
            }

        }

let me know if this helps.

bdytx5 commented 1 year ago

Ok yeah I'll see if I can piece something together. If not, I may use the kotlin code to put together a new android plugin. Would be happy to share that if so.

Caldarie commented 1 year ago

No problems. If you do have a solution, the Flutter community and I would appreciate your contribution very much :)