CPJKU / madmom

Python audio and music signal processing library
https://madmom.readthedocs.io
Other
1.35k stars 206 forks source link

How to use madmom to detect onset of human speech in any audio file? #347

Closed StanSilas closed 6 years ago

StanSilas commented 6 years ago

I'm trying to remove hold music/ivrs/dialtone/ringback tones from any given audio (.wav/.mp3) .

the easiest approach that occurred to me was to somehow "detect the onset of the first speech segment" and then delete what ever occurs before that point in time.

Is it possible to use MadMom to detect sections of speech and sections of "non-speech/hold music" in any given arbirarty audio file?

superbock commented 6 years ago

Not out of the box. There are several different approaches on how to detect/separate speech from non-speech, but none of them are integrated in madmom (yet). You could have a look at the works of my (former) colleagues Jan Schlüter and Reinhard Sonnleitner presented at DAFx 2012. There are plenty of others of course, but these are the first that come to my mind. The latter feature should be easy to implement, an (inefficient) implementation of the correlation part is already in features.onsets.

HTH