MycroftAI / mycroft-precise

A lightweight, simple-to-use, RNN wake word listener
Apache License 2.0
818 stars 228 forks source link

Precise_runner failure on Pinephone (aarch64) #199

Closed 1001111github closed 3 years ago

1001111github commented 3 years ago

Hello, I posted this over on the community forum and it was suggested I post here. I am porting Kalliope, which uses a lot of precise code, thank you, to the Pinephone (aarch64). When using precise_runner the process will crash when reading from the microphone due to the use of a hard-coded sampling rate when initializing the stream.

2020-12-22 17:02:49 :: kalliope-0.7.0 :: Say something! Exception in thread Thread-3: Traceback (most recent call last): File /usr/lib/python3.9/threading.py, line 954, in _bootstrap_inner self.run() File /usr/lib/python3.9/threading.py, line 892, in run self._target(*self._args, **self._kwargs) File /usr/lib/python3.8/site-packages/precise_runner/runner.py, line 231, in _handle_predictions chunk = self.stream.read(self.chunk_size) File /usr/lib/python3.8/site-packages/precise_runner/runner.py, line 186, in stream.read = lambda x: pyaudio.Stream.read(stream, x // 2, False) File /usr/lib/python3.9/site-packages/pyaudio.py, line 608, in read return pa.read_stream(self._stream, num_frames, exception_on_overflow) OSError: [Errno -9999] Unanticipated host error

Another Kalliope process returns "Invalid number of frames" from the exact same call. The problem also occurs when using the speech_recognition module directly and even when not using python. The alsa utility arecord with default values has the same issue.

In precise_runner/runner.py:

class PreciseRunner def start(self): """ Start listening from stream """ if self.stream is None: from pyaudio import PyAudio, paInt16 self.pa = PyAudio() self.stream = self.pa.open(16000, 1, paInt16, True, frames_per_buffer=self.chunk_size )

Changing the 16000 to (12000 and below) or (24000 and above) prevents the problem from occurring. Of course this creates another problem. All the training is done with 16K samples, lol, so word recognition does not happen, but precise_runner works properly and does not crash.

The forum suggested resampling within pulseaudio between the mike and precise_runner. I understand I have to create a module with a filter to do the resampling. I have no idea which filter, or how to configure the filter, or even how to find the required filter. Another serious concern is the CPU cost of resampling inline.

Is resampling inline feasible or is a reasonable alternative to resample the training data? TIA

1001111github commented 3 years ago

After OS and python updates on January 8th and 9th, everything, from alsa to speech_recognition and precise_runner is able to record at 16K properly. Moving from the amputee ward back to the bleeding edge and closing the issue.

krisgesling commented 3 years ago

Hey there, glad it's working now but thanks for flagging it anyway. Resampling is something we may need to tackle at some point.