amsehili / auditok

An audio/acoustic activity detection and audio segmentation tool
MIT License
732 stars 94 forks source link

Real-Time Silence detection #23

Closed Nickil21 closed 4 years ago

Nickil21 commented 4 years ago

I am using PyAudio to collect audio streams from the microphone input to detect end-of-speech. I was wondering if I can process the streams in a real-time manner using auditok. If so, how do I optimize it to run faster?

I am using the following code for the time-being:

from auditok import ADSFactory, AudioEnergyValidator, StreamTokenizer, player_for

# We set the `record` argument to True so that we can rewind the source
asource = ADSFactory.ads(max_time=10, record=True)

validator = AudioEnergyValidator(sample_width=asource.get_sample_width(), 
                                 energy_threshold=65)

tokenizer = StreamTokenizer(
    validator=validator,
    min_length=20,
    max_length=400,
    max_continuous_silence=30,
)

asource.open()
tokens = tokenizer.tokenize(asource)

# Play detected regions back
player = player_for(asource)

trimmed_signal = b""
for p, q, r in tokens:
    trimmed_signal += b"".join(p)
print("\n ** Playing trimmed signal...")
player.play(trimmed_signal)

asource.close()
player.stop()

Ultimately, my end goal is to record user's input from the microphone (mobile device) and record audio, before doing a speech-to-text transcription using Google API.

amsehili commented 4 years ago

Hello,

Are you using the latest version of auditok? If not I'd recommend that you start using it because it's much faster and more user-friendly.

The latest version is currently not installable from pypi but you can install it from github:

sudo pip install git+https://github.com/amsehili/auditok

You want to record data then split it or split data in real-time as you read from microphone.

1. Record then split

from auditok import AudioRegion

# read 5 seconds of audio data from microphone
region = AudioRegion.load(input=None, max_read=5, sampling_rate=16000, sample_width=2, channels=1)
regions = region.split()

for r in regions:
    r.play(progress_bar=True) # progress bar requires `tqdm`

You can play, save or plot a region. Please refer to the documentation of AudioRegion for more information about these methods. You might also want to use custom split parameters, especially energy_threshold, min_dur, max_dur and max_silence here is the documentation for split.

2. Real-time split

from auditok import split, AudioReader

# input=None because we want to read from the mic
reader = AudioReader(input=None, record=True, sr=16000, sw=2, ch=1, max_read=5)
for (i, region) in enumerate(split(reader)):
    region.play(progress_bar=True)
    region.save("{}.wav".format(i))

# save acquired data
reader.rewind()
region = AudioRegion.load(reader.data, sr=reader.sr, sw=reader.sw, ch=reader.channels)
region.save("main_stream.wav")
Nickil21 commented 4 years ago

Thank you. Any idea about how to integrate this for running on edge devices like mobile? I plan to record via phone microphone on detecting silence, send the recorded audio to Google Speech API to get the transcript.

amsehili commented 4 years ago

You should try Pydroid.

Evidlo commented 4 months ago

In the above real-time example, is there a way to get access to the bytes corresponding to a region? I would have expected region.data, but this doesn't exist.

edit: I found that there is the useful region._data. This is what I need to pass to my speech-to-text engine. Not sure why it's marked as an internal API.

amsehili commented 4 months ago

Hi @Evidlo you can used bytes(region) to get raw bytes data or numpy.array(region) or region.samples to get a numpy multidimensional array that contains 1 audio channel per 1-D array.