Closed StanSilas closed 6 years ago
Not out of the box. There are several different approaches on how to detect/separate speech from non-speech, but none of them are integrated in madmom (yet). You could have a look at the works of my (former) colleagues Jan Schlüter and Reinhard Sonnleitner presented at DAFx 2012. There are plenty of others of course, but these are the first that come to my mind. The latter feature should be easy to implement, an (inefficient) implementation of the correlation part is already in features.onsets.
HTH
I'm trying to remove hold music/ivrs/dialtone/ringback tones from any given audio (.wav/.mp3) .
the easiest approach that occurred to me was to somehow "detect the onset of the first speech segment" and then delete what ever occurs before that point in time.
Is it possible to use MadMom to detect sections of speech and sections of "non-speech/hold music" in any given arbirarty audio file?