Kitt-AI / snowboy

Future versions with model training module will be maintained through a forked version here: https://github.com/seasalt-ai/snowboy
Other
3.08k stars 997 forks source link

Save stream data as wav file? #109

Closed infernusdomoso closed 7 years ago

infernusdomoso commented 7 years ago

I'm pretty new with audio data, and I'm trying to either have the software break and release the microphone when the wake word is heard, so the audio can be recorded and saved, or maybe easier, start saving the stream directly. I've tried using the wave library to save the data as a wav file but it's all garbled noise, so I'm thinking the 'data' object that is the stream audio is not in the right format?

Can someone point me in the right direction for saving the stream audio as a wav file? It always seems to make an 8k file thats just garbled noise. I'm clearly missing something.

chenguoguo commented 7 years ago

You can write whatever you send to the RunDetection function to a file, that's the raw audio file. You can then use sox to turn it into a wave file with header, e.g., by using the following command:

sox -r 16000 -c 1 -b 16 -e signed-integer -t raw audio.raw -t wav audio.wav

where audio.raw is what you write to the disk.

infernusdomoso commented 7 years ago

Awesome, thanks. One more question - how do I extend the amount of data that is sent to RunDetection? I saved the data to disk and converted it with sox, but it's about half a second of my voice - I need to make the buffer larger. I've tried a few things and nothing seems to work.

infernusdomoso commented 7 years ago

Nevermind, figured it out. I added a second ringbuffer object that can grow to an unlimited size, then I set a recording flag and it keeps that data, I sleep inside the callback function then save the raw data to disk to process. Thanks for the info on the sox command.

tmzqvd commented 7 years ago

Sorry to bother you infernusdomoso. I assume you are using python. Is it possible to get code snippet for saving raw audio to file? I tried several solutions but all I get is quiet bang. Hot word detection is working, so it's not problem with mic.

One of my tries was (I know it's stupid to open file for writting each loop)... I'm editing snowboydecoder.py and this is loop snippet where I added saving to file: ` is_detected = False

    while True:
        if interrupt_check():
            logger.debug("detect voice break")
            break
        data = self.ring_buffer.get()
        if len(data) == 0:
            time.sleep(sleep_time)
            continue

        ans = self.detector.RunDetection(data)
        if ans == -1:
            logger.warning("Error initializing streams or reading audio data")
        elif ans > 0:
            message = "Keyword " + str(ans) + " detected at time: "
            message += time.strftime("%Y-%m-%d %H:%M:%S",
                                     time.localtime(time.time()))
            logger.info(message)
            with open('recorded_sound.raw', 'ab') as f:
                f.write(data)
            is_detected = True
            callback = detected_callback[ans-1]
            if callback is not None:
                callback()
        elif (ans == -2 and is_detected): #
            # save file if silence is detected
            is_detected = False
            message = "Silence detected. Saving to recorded_sound.raw file."
            logger.info(message)
            with open('recorded_sound.raw', 'ab') as f:
                f.write(data)
            f.close()

`

tnx

GiovanniBalestrieri commented 6 years ago

Here is a code snipped that implements what infernusdomoso suggested: Add the second ring buffer in the init self.ring_buffer_hot = RingBuffer(4096*8) #or what dimension you want

Then in the audio_callback() add self.ring_buffer_hot.extend(in_data)

Finally in the start method, add the following lines in the if callback is not None condition:

data1 = self.ring_buffer_hot.get()
with open('recorded_sound.raw', 'ab') as f:
            f.write(data1)
            f.close()

The try a hotword and a file called recorded_sound.raw should be created. convert in using the sox command provided by chenguoguo:

sox -r 16000 -c 1 -b 16 -e signed-integer -t raw recorded_sound.raw -t wav audio.wav
aplay audio.wav

Hope it helps

PS: Do you know any python command to convert the audio file to a 16bit 1 channel 16kHz wave file?

EDIT: All right insted of:

with open('recorded_sound.raw', 'ab') as f:
            f.write(data1)
            f.close()

use:

try:
    f = wave.open('recorded_sound.raw', 'wb')
    f.setparams((1,2,16000,0,'NONE','NONE'))
    f.writeframes(data1)
    f.close() 
except IOError as e:
   print(e)

works flawlessly

chenguoguo commented 6 years ago

@GiovanniBalestrieri Great! Would you like to submit a PR to https://github.com/Kitt-AI/snowboy/tree/master/examples/Python?