iOS often misses the first word after calling `.listen()` for n+1 times

vongrad commented 1 month ago

I have been struggling with an issue where iOS does not recognize the first word after calling the .listen() for second time and further, i.e. listen (all good here) -> stop -> listen (missed first word) -> stop -> etc...

I have set up a native iOS app using your SwiftSpeechToTextPlugin.swift and tried to debug what is causing it. What I have noticed is that if I benchmark the time from try self.audioEngine.start() to the first buffer received in the callback of inputNode?.installTap, there is approx. 175ms, which is sufficient to catch the first word. However on the second+ call to listenForSpeech, the same benchmark results in approx. 850ms, which is more than enough to miss the first word.

After experimenting a bit, I noticed that instantiating a new audioEngine and of course inputNode fixes this issue and we are back on cirka 175ms before receiving the first buffer on second+ calls. I did not try to dig deeper into why reusing audioEngine produces such a delay even though all related resources seems to be deallocated properly looking at your code.

If you want, I can make a PR that implements the suggested fix - let me know if I should go for it.

sowens-csd commented 1 month ago

Yes please. A PR would be welcome. Thank you for diving into that problem. I won’t be able to review properly for a couple of weeks but I do appreciate the PR.

vongrad commented 3 weeks ago

I have made the promised PR: https://github.com/csdcorp/speech_to_text/pull/513

csdcorp / speech_to_text

iOS often misses the first word after calling `.listen()` for n+1 times #511