Closed Nickil21 closed 4 years ago
Hello,
Are you using the latest version of auditok? If not I'd recommend that you start using it because it's much faster and more user-friendly.
The latest version is currently not installable from pypi
but you can install it from github
:
sudo pip install git+https://github.com/amsehili/auditok
You want to record data then split it or split data in real-time as you read from microphone.
from auditok import AudioRegion
# read 5 seconds of audio data from microphone
region = AudioRegion.load(input=None, max_read=5, sampling_rate=16000, sample_width=2, channels=1)
regions = region.split()
for r in regions:
r.play(progress_bar=True) # progress bar requires `tqdm`
You can play
, save
or plot
a region. Please refer to the documentation of AudioRegion
for more information about these methods. You might also want to use custom split parameters, especially energy_threshold
, min_dur
, max_dur
and max_silence
here is the documentation for split
.
from auditok import split, AudioReader
# input=None because we want to read from the mic
reader = AudioReader(input=None, record=True, sr=16000, sw=2, ch=1, max_read=5)
for (i, region) in enumerate(split(reader)):
region.play(progress_bar=True)
region.save("{}.wav".format(i))
# save acquired data
reader.rewind()
region = AudioRegion.load(reader.data, sr=reader.sr, sw=reader.sw, ch=reader.channels)
region.save("main_stream.wav")
Thank you. Any idea about how to integrate this for running on edge devices like mobile? I plan to record via phone microphone on detecting silence, send the recorded audio to Google Speech API to get the transcript.
You should try Pydroid.
In the above real-time example, is there a way to get access to the bytes corresponding to a region? I would have expected region.data
, but this doesn't exist.
edit: I found that there is the useful region._data
. This is what I need to pass to my speech-to-text engine. Not sure why it's marked as an internal API.
Hi @Evidlo
you can used bytes(region)
to get raw bytes data or numpy.array(region)
or region.samples
to get a numpy
multidimensional array that contains 1 audio channel per 1-D array.
I am using
PyAudio
to collect audio streams from the microphone input to detect end-of-speech. I was wondering if I can process the streams in a real-time manner usingauditok
. If so, how do I optimize it to run faster?I am using the following code for the time-being:
Ultimately, my end goal is to record user's input from the microphone (mobile device) and record audio, before doing a speech-to-text transcription using Google API.