Picovoice / cobra

On-device voice activity detection (VAD) powered by deep learning
https://picovoice.ai/
Apache License 2.0
176 stars 11 forks source link

VAD and Recording Audio File simultaneously #137

Closed Tryad0 closed 1 year ago

Tryad0 commented 1 year ago

Is it possible to use Cobra VAD while recording an audio file on Android and IOS? Many other Speech Recognizer have problems with.

laves commented 1 year ago

Yes, this is possible. For instance, in our Android demo, you can see we call this code:

if (audioRecord.read(buffer, 0, buffer.length) == buffer.length) {
    final float voiceProbability = cobra.process(buffer);
    // ...
}

If you write that buffer to an audio file, you will have recorded what Cobra processed. We are unable to provide support on how to read/write audio files as it's out of the scope of this repo, but there are many resources out there for it.

Tryad0 commented 1 year ago

Would you just give me a hint how to access the buffer in iOS? Thanks a lot for your answers!!

laves commented 1 year ago

Take a look at our iOS demo where we use one of our other libraries, ios-voice-processor, to get audio buffers. E.g.


private func audioCallback(pcm: [Int16]) -> Void {
    let result:Float32 = try self.cobra!.process(pcm: pcm)
    // ...
}

try VoiceProcessor.shared.start(
    frameLength: Cobra.frameLength,
    sampleRate: Cobra.sampleRate,
    audioCallback: self.audioCallback)
Tryad0 commented 1 year ago

Well I am searching for a solution in IOS for the last two days but really nothing worked. I just do not know how to save all the pcm from the audioCallback the right way to use it in Flutter. I even don't get it playing the pcm in swift, the right way. Even started a question: https://stackoverflow.com/questions/75794943/flutter-swift-int16-to-audio-file?noredirect=1#comment133702093_75794943

Is it also possible to record an audio in flutter with the "record: ^4.4.4" while the Cobra VAD is listening. In IOS and also Android?

kenarsa commented 1 year ago

Closing due to inactivity. re-open if needed

ArtemBernatskyy commented 1 year ago

We are using cobra and pvporcupine together, the only problem that we are trying to detect hot word, then start recording what user says and when there are no voice activity for couple of seconds (cobra VAD) we save the file for speech to text processing.

The only problem that the audio file that is created is of low quality (because cobra and porcupine use 16kHz instead of 48kHz or 44.8kHz quality), maybe someone has ready example for this kind of situation about how to "change sampling" for cobra and use other recorder instead of PvRecorder? Thx!

ArtemBernatskyy commented 1 year ago

No problem at all 😈, after 2 days fighting with resampling on fly we got the solution

P.S. if someone someday will need help for running bellow script contact me and I will provide support, no need to wait for months for support from them (they ofk get paid for that and thus have wrong incentives to provide proper documentation and/or guidance)

"""
how to detect hot word, then start recording what user says
and when there are no voice activity for couple of seconds (cobra VAD)
then save the file for later speech to text processing
"""
import wave
import time

import pyaudio
import resampy
import pvcobra
import numpy as np
import pvporcupine
from pvrecorder import PvRecorder

FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 48000  # recording in high quality
TARGET_RATE = 16000  # passing to cobra (voice activity detection) in low quality
# fking magic, I don't know why, but experimentally, it works
chunk_size = int((RATE / TARGET_RATE) * 512)
PICOVOICE_ACCESS_KEY = "oI/XXXXXXXXXXXXXXXXXXXXXXX"
keywords = ["example"]  # example hot word
keyword_paths = [pvporcupine.KEYWORD_PATHS[x] for x in keywords]
sensitivities = [0.5] * len(keyword_paths)
AUDIO_DEVICE_INDEX = -1  # default device

def record_until_voice_activity_stops():
    """Function will record at least for 2.5 seconds and will stop recording when it stops detecting voice after 0.5 second"""
    audio_recorder = pyaudio.PyAudio()
    stream = audio_recorder.open(
        format=FORMAT,
        channels=CHANNELS,
        rate=RATE,
        input=True,
        input_device_index=None,
        frames_per_buffer=chunk_size,
    )
    output_filename = "output.wav"

    frames = []
    elapsed_time = 0
    minimal_duration = 2.5  # in seconds minimal duration for VOD after hot word
    duration = minimal_duration

    start_time = time.time()

    print("Started recording. Will finish when no voice activity will be detected...")
    while (elapsed_time < duration) or (elapsed_time < minimal_duration):
        chunk = stream.read(chunk_size, exception_on_overflow=False)
        frames.append(chunk)
        elapsed_time = time.time() - start_time
        # Convert bytes to int16 NumPy array
        chunk_int16 = np.frombuffer(chunk, dtype=np.int16)
        # Resample to TARGET_RATE
        resampled_chunk_array = resampy.resample(chunk_int16, RATE, TARGET_RATE)
        # detect voive activity probability
        voice_probability = cobra.process(resampled_chunk_array)
        if voice_probability > 0.5:
            duration = elapsed_time + 0.500

    print(f"Saved to file {output_filename}")
    wf = wave.open(output_filename, "wb")
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(audio_recorder.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b"".join(frames))
    wf.close()
    # Stop and close the stream
    stream.stop_stream()
    stream.close()
    # Terminate the PortAudio interface
    audio_recorder.terminate()

if __name__ == "__main__":
    # init pvporcupine
    porcupine = pvporcupine.create(
        access_key=PICOVOICE_ACCESS_KEY,
        keyword_paths=keyword_paths,
        sensitivities=sensitivities,
    )
    # init pv_recorder
    # P.S. we are using for hot word detection PvRecorder instead of pyaudio + resampy for efficiency
    pv_recorder = PvRecorder(device_index=AUDIO_DEVICE_INDEX, frame_length=porcupine.frame_length)

    # init cobra
    cobra = pvcobra.create(access_key=PICOVOICE_ACCESS_KEY)

    # start listening for hot word
    pv_recorder.start()
    print("Ready, waiting for hot word...")
    try:
        while True:
            pcm = pv_recorder.read()
            is_hot_word = porcupine.process(pcm)

            if is_hot_word >= 0:
                print("Detected hot word...")
                # stopping here pv_recorder so it won't "fight" for the access to microphone resource with pyaudio
                pv_recorder.stop()

                # run recording with voice activity
                record_until_voice_activity_stops()

                pv_recorder.start()
                print("Listening ... (press Ctrl+C to exit)")
    except KeyboardInterrupt:
        print("Stopping ...")
    finally:
        pv_recorder.delete()
        porcupine.delete()
        cobra.delete()

keywords for seo: pvcobra, pvporcupine and pvrecorder together, how to run pvcobra, pvporcupine and pvrecorder together, how to detect hot word and record into file with voice activity detection