Writing the activated audio/bytes out to a file

rohandeo commented 4 years ago

Goal

I am trying to capture the audio segment which contains the wakeword from my raw input audio stream from microphone.

Background

I have trained my own model on a private dataset and everything works fine. I'm getting testing accuracy of over 99%. When I use precise-listen to test on live input streams, it gives decent results but I have to be very close to the mic when I speak. If I'm not close (within 1.5 meters and speaking in a normal, conversational volume and pitch), the model does not activate at all.

Initial steps

I have written a small script which creates a GUI and starts the runner when I press a "Start" button and stops it when I press "Stop". I then modified mycroft-precise/runner/precise_runner/runner.py to write out the bytes or "chunks" which were getting activated. I first wanted to check whether I was able to get the audio out or not so I initialized a bytes string called "self.record" and appended all the chunks (regardless of whether they were activated or not) and wrote them out to a file. I am pasting the modified functions inside class PreciseRunner so as to not clutter the issue. The changed lines have been marked.

Code for recording all bytes from mic input

def _handle_predictions(self):
        """Continuously check Precise process output"""
        while self.running:
            chunk = self.stream.read(self.chunk_size)
           # Start Recording
            self.record += chunk
            if self.is_paused:
                continue

            prob = self.engine.get_prediction(chunk)
            print(prob)
            self.on_prediction(prob)
            # print(prob)
            if self.detector.update(prob, chunk):
                self.on_activation()
                # print("Yup works")

Code for writing the recorded bytes

def stop(self):
        """Stop listening and close stream"""
       # Start writing
        f = open('/home/rohan/extra1', 'wb')
        f.write(self.record)
        f.close()
        # print("stopped recording")
        # Stop writing
        if self.thread:
            self.running = False
            if isinstance(self.stream, ReadWriteStream):
                self.stream.write(b'\0' * self.chunk_size)
            self.thread.join()
            self.thread = None

        self.engine.stop()

        if self.pa:
            self.pa.terminate()
            self.stream.stop_stream()
            self.stream = self.pa = None

Data

The live data is very lossy. I'm attaching a sample file which I recorded using this script (I converted from bytes to wav for your convenience). Transcript: "bed on karaoke similarity reality" live_stream_capture.zip I'm using Ubuntu 18.04 (4GB RAM, 4CPUs) inside VirtualBox on a Windows 10 machine

Issues

I have compared recordings from a default recording device on Linux and Windows to the one which I captured using this script. Windows was the cleanest by far. I used 'arecord' with S16_LE as the dataype to record on Linux. That too was cleaner (marginally) than the live capture. What is the reason for this?
Can I use a different technique to capture the activated wakeword? I do not want it to be so lossy and is there a way share the mic input between two threads? Thread1 will run Precise and listen for activations. Thread2 will keep recording the mic stream. Once Thread1 activates, it can notify Thread2 to return the latest 2 seconds of recording. Is this possible?

el-tocino commented 4 years ago

You can turn on wakeword saving in the mycroft config if you just want to keep the recorded wake word audio. It's incredibly useful for training your model.

rohandeo commented 4 years ago

@el-tocino I'm using a source install as instructed on the mycroft-precise Github page. Could you tell me where I can find the mycroft config file?

el-tocino commented 4 years ago

If you installed mycroft locally, there's one in ~/.mycroft/mycroft.conf, or you can edit one in /etc/mycroft/mycroft.conf

rohandeo commented 4 years ago

I do not have a ~/.mycroft or /etc/mycroft directory. I'm just testing out my trained model for now. Could you elaborate more on your solution?

el-tocino commented 4 years ago

https://mycroft-ai.gitbook.io/docs/using-mycroft-ai/customizations/mycroft-conf https://mycroft-ai.gitbook.io/docs/using-mycroft-ai/customizations/wake-word

rohandeo commented 4 years ago

@el-tocino I have not installed mycroft-core and would like to know if there is a way to solve my issue without installing it. I am only interested in recognizing a word from live stream and I have other computations to run on that word once it is recognized.

el-tocino commented 4 years ago

Sorry, I only use it with mycroft.

MatthewScholefield commented 4 years ago

@rohandeo Just to ensure the recording is getting saved properly, try using the mechanism precise has for saving wake words. There are two ways:

precise-listen -s somefolder mymodel.net: This saves activations to somefolder
precise-collect: This saves audio to wav files in the same way precise gets data from the microphone.

Let me know if you can reproduce the results using either of those methods.

Matthew

rohandeo commented 4 years ago

@MatthewScholefield I don't think precise-listen has an option to save activations. This is the list of options that precise-listen supports which I found in the precise-listen script.

:model str
    Either Keras (.net) or TensorFlow (.pb) model to run

:-c --chunk-size int 2048
    Samples between inferences

:-l --trigger-level int 3
    Number of activated chunks to cause an activation

:-s --sensitivity float 0.5
    Network output required to be considered activated

:-b --basic-mode
    Report using . or ! rather than a visual representation

:-d --save-dir str -
    Folder to save false positives

:-p --save-prefix str -
    Prefix for saved filenames

Is there some other way to save activated audio?

precise-collect output:

precise-collect-output.zip Transcript: "bed on karaoke similarity reality"

Not much difference. Data is till lossy. Is the PyAudio module the reason behind the lossy data?

MatthewScholefield commented 4 years ago

Sorry, I used the wrong flag, it's the -d savedir flag as you can see in the info string.

On Tue, May 12, 2020, 8:44 PM Rohan Deo notifications@github.com wrote:

@MatthewScholefield https://github.com/MatthewScholefield I don't think precise-listen has an option to save activations. This is the list of options that precise-listen supports which I found in the precise-listen script.

:model str Either Keras (.net) or TensorFlow (.pb) model to run

:-c --chunk-size int 2048 Samples between inferences

:-l --trigger-level int 3 Number of activated chunks to cause an activation

:-s --sensitivity float 0.5 Network output required to be considered activated

:-b --basic-mode Report using . or ! rather than a visual representation

:-d --save-dir str - Folder to save false positives

:-p --save-prefix str - Prefix for saved filenames

Is there some other way to save activated audio? precise-collect output:

precise-collect-output.zip https://github.com/MycroftAI/mycroft-precise/files/4619327/precise-collect-output.zip Transcript: "bed on karaoke similarity reality"

Not much difference. Data is till lossy. Is the PyAudio module the reason behind the lossy data?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MycroftAI/mycroft-precise/issues/156#issuecomment-627694855, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABM2KSZR66UQQ73FGOS2XADRRH3ORANCNFSM4M667COA .

rohandeo commented 4 years ago

Isn't that just for false positives though? @MatthewScholefield

MatthewScholefield commented 4 years ago

It's saves any activation since it doesn't know which one is a proper activation and which one is a false positive. I suppose this should be corrected in the docstring.

Matthew

On Tue, May 12, 2020, 10:09 PM Rohan Deo notifications@github.com wrote:

Isn't that just for false positives though? @MatthewScholefield https://github.com/MatthewScholefield

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MycroftAI/mycroft-precise/issues/156#issuecomment-627717786, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABM2KS5RW2DH6HFDPRDLAATRRIFPTANCNFSM4M667COA .

rohandeo commented 4 years ago

@MatthewScholefield The precise-listen option worked like a charm. I think the reason behind that was the precise-listen script converts the bytes audio back into float32 (greater precision -> greater clarity). Thanks a lot for your help. P.S. Sorry for the late reply. Finals week.

MatthewScholefield commented 4 years ago

No problem, I would expect that there might have been some other subtle bug causing the issue with those audio files since they did sound awfully strange, but I'm glad it's resolved. And no worries, I've also been busy with finals.

dipendra77 commented 3 years ago

Hi @rohandeo I have a similar application. I have a custom wakeword and it works well. I want to record the audio after the wakeword and save it to a file. I am very new to using Mycroft. Can you help me with the code ? This is what I have tried to record audio after I detect the wakeword. The output I get is just full of noise.

 def _handle_predictions(self):     

    while self.running:

        chunk = self.stream.read(self.chunk_size)

        if self.is_paused:
            continue

        prob = self.engine.get_prediction(chunk)
        self.on_prediction(prob)
        if self.detector.update(prob):
            # self.on_activation()
            # record
            print("Activated")
            # chunk = self.stream.read(self.chunk_size)
            frames =[]
            for i in range(0, int(16000 / self.chunk_size * 10)):

                chunk = self.stream.read(self.chunk_size)
                frames.append(chunk)
            wf = wave.open("test.wav", 'wb')
            wf.setnchannels(6)
            wf.setsampwidth(self.pa.get_sample_size(self.pa.get_format_from_width(2)))
            wf.setframerate(16000)
            wf.writeframes(b''.join(frames))
            wf.close()
            print("* done recording")

hoavt-54 commented 3 years ago

@Dipendra77 Your code works just fine for me, except that I use 1 channel and sample width 2.

MycroftAI / mycroft-precise