amsehili / auditok

An audio/acoustic activity detection and audio segmentation tool
MIT License
732 stars 94 forks source link

Use auditok as API to detect pauses in speech #25

Closed bwang482 closed 4 years ago

bwang482 commented 4 years ago

Thanks for the great lib!!

I have a bunch of utterances extracted from conversations, and I want to detect pauses in each of these utterances and how long the pauses are, both short and long pauses (500ms is the threshold for example for determining long or short).

Is it possible to use auditok as API to call such function for pause detection, in my existing data pipeline at all? Sorry if the question seems general, but it'd be greatly appreciated if you can provide any advices. Thank you again.

amsehili commented 4 years ago

Hello,

Audio regions returned by the split function (or by AudioRegion.split or AudioRegion.split_and_plot methods) contain the start and end information in their metadata. You can extract pauses' onset/offset which are simply the end of region i and the start of region i+1. You also need to deal with corner cases which are:

So concretely you can do something like this:

from auditok import AudioRegion
from auditok.dataset import one_to_six_arabic_16000_mono_bc_noise
region = AudioRegion.load(one_to_six_arabic_16000_mono_bc_noise)
regions = region.split_and_plot(energy_threshold=65, drop_trailing_silence=True)

# print regions' start/end
for r in regions:
    print(r.meta.start, r.meta.end)

# extract pauses
onset = 0
pauses = []
for r in regions:
   if onset < r.meta.start:       
       pauses.append((onset, r.meta.start))
       onset = r.meta.end

# and the last pause if there exists
if onset < region.duration:
   pauses.append((onset, region.duration))

Remarks