Closed Tryad0 closed 1 year ago
Yes, this is possible. For instance, in our Android demo, you can see we call this code:
if (audioRecord.read(buffer, 0, buffer.length) == buffer.length) {
final float voiceProbability = cobra.process(buffer);
// ...
}
If you write that buffer to an audio file, you will have recorded what Cobra processed. We are unable to provide support on how to read/write audio files as it's out of the scope of this repo, but there are many resources out there for it.
Would you just give me a hint how to access the buffer in iOS? Thanks a lot for your answers!!
Take a look at our iOS demo where we use one of our other libraries, ios-voice-processor, to get audio buffers. E.g.
private func audioCallback(pcm: [Int16]) -> Void {
let result:Float32 = try self.cobra!.process(pcm: pcm)
// ...
}
try VoiceProcessor.shared.start(
frameLength: Cobra.frameLength,
sampleRate: Cobra.sampleRate,
audioCallback: self.audioCallback)
Well I am searching for a solution in IOS for the last two days but really nothing worked. I just do not know how to save all the pcm from the audioCallback the right way to use it in Flutter. I even don't get it playing the pcm in swift, the right way. Even started a question: https://stackoverflow.com/questions/75794943/flutter-swift-int16-to-audio-file?noredirect=1#comment133702093_75794943
Is it also possible to record an audio in flutter with the "record: ^4.4.4" while the Cobra VAD is listening. In IOS and also Android?
Closing due to inactivity. re-open if needed
We are using cobra
and pvporcupine
together, the only problem that we are trying to detect hot word, then start recording what user says and when there are no voice activity for couple of seconds (cobra VAD) we save the file for speech to text processing.
The only problem that the audio file that is created is of low quality (because cobra and porcupine use 16kHz instead of 48kHz or 44.8kHz quality), maybe someone has ready example for this kind of situation about how to "change sampling" for cobra and use other recorder instead of PvRecorder
? Thx!
No problem at all 😈, after 2 days fighting with resampling on fly we got the solution
P.S. if someone someday will need help for running bellow script contact me and I will provide support, no need to wait for months for support from them (they ofk get paid for that and thus have wrong incentives to provide proper documentation and/or guidance)
"""
how to detect hot word, then start recording what user says
and when there are no voice activity for couple of seconds (cobra VAD)
then save the file for later speech to text processing
"""
import wave
import time
import pyaudio
import resampy
import pvcobra
import numpy as np
import pvporcupine
from pvrecorder import PvRecorder
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 48000 # recording in high quality
TARGET_RATE = 16000 # passing to cobra (voice activity detection) in low quality
# fking magic, I don't know why, but experimentally, it works
chunk_size = int((RATE / TARGET_RATE) * 512)
PICOVOICE_ACCESS_KEY = "oI/XXXXXXXXXXXXXXXXXXXXXXX"
keywords = ["example"] # example hot word
keyword_paths = [pvporcupine.KEYWORD_PATHS[x] for x in keywords]
sensitivities = [0.5] * len(keyword_paths)
AUDIO_DEVICE_INDEX = -1 # default device
def record_until_voice_activity_stops():
"""Function will record at least for 2.5 seconds and will stop recording when it stops detecting voice after 0.5 second"""
audio_recorder = pyaudio.PyAudio()
stream = audio_recorder.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
input_device_index=None,
frames_per_buffer=chunk_size,
)
output_filename = "output.wav"
frames = []
elapsed_time = 0
minimal_duration = 2.5 # in seconds minimal duration for VOD after hot word
duration = minimal_duration
start_time = time.time()
print("Started recording. Will finish when no voice activity will be detected...")
while (elapsed_time < duration) or (elapsed_time < minimal_duration):
chunk = stream.read(chunk_size, exception_on_overflow=False)
frames.append(chunk)
elapsed_time = time.time() - start_time
# Convert bytes to int16 NumPy array
chunk_int16 = np.frombuffer(chunk, dtype=np.int16)
# Resample to TARGET_RATE
resampled_chunk_array = resampy.resample(chunk_int16, RATE, TARGET_RATE)
# detect voive activity probability
voice_probability = cobra.process(resampled_chunk_array)
if voice_probability > 0.5:
duration = elapsed_time + 0.500
print(f"Saved to file {output_filename}")
wf = wave.open(output_filename, "wb")
wf.setnchannels(CHANNELS)
wf.setsampwidth(audio_recorder.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b"".join(frames))
wf.close()
# Stop and close the stream
stream.stop_stream()
stream.close()
# Terminate the PortAudio interface
audio_recorder.terminate()
if __name__ == "__main__":
# init pvporcupine
porcupine = pvporcupine.create(
access_key=PICOVOICE_ACCESS_KEY,
keyword_paths=keyword_paths,
sensitivities=sensitivities,
)
# init pv_recorder
# P.S. we are using for hot word detection PvRecorder instead of pyaudio + resampy for efficiency
pv_recorder = PvRecorder(device_index=AUDIO_DEVICE_INDEX, frame_length=porcupine.frame_length)
# init cobra
cobra = pvcobra.create(access_key=PICOVOICE_ACCESS_KEY)
# start listening for hot word
pv_recorder.start()
print("Ready, waiting for hot word...")
try:
while True:
pcm = pv_recorder.read()
is_hot_word = porcupine.process(pcm)
if is_hot_word >= 0:
print("Detected hot word...")
# stopping here pv_recorder so it won't "fight" for the access to microphone resource with pyaudio
pv_recorder.stop()
# run recording with voice activity
record_until_voice_activity_stops()
pv_recorder.start()
print("Listening ... (press Ctrl+C to exit)")
except KeyboardInterrupt:
print("Stopping ...")
finally:
pv_recorder.delete()
porcupine.delete()
cobra.delete()
keywords for seo: pvcobra, pvporcupine and pvrecorder together, how to run pvcobra, pvporcupine and pvrecorder together, how to detect hot word and record into file with voice activity detection
Is it possible to use Cobra VAD while recording an audio file on Android and IOS? Many other Speech Recognizer have problems with.