aalto-speech / speaker-diarization

Speaker diarization scripts, based on AaltoASR
190 stars 37 forks source link

separate `.wav` files per speaker #5

Closed aalavandhan closed 7 years ago

aalavandhan commented 7 years ago

I am looking to generate separate .wav files for each speaker after diarization.

Is this an existing feature, or does this need to be built?

I would like to build this feature, which consumes the output from spk-diarization2.py and generate .wav files for each speaker.

antoniomo commented 7 years ago

Hi!

This feature doesn't exist :) If you need it, I think the easiest way would be to get the per-speaker turn start/end times from the spk-diarization2.py output recipe, then use sox trim to cut those turns into separate files, and then sox file1 file2 ... to merge them in a single file per speaker as needed.

trim {position(+)}
              Cuts portions out of the audio.  Any number of positions may be given; audio is not sent to  the
              output until the first position is reached.  The effect then alternates between copying and dis‐
              carding audio at each position.  Using a value of 0 for  the  first  position  parameter  allows
              copying from the beginning of the audio.

              For example,
                 sox infile outfile trim 0 10
              will copy the first ten seconds, while
                 play infile trim 12:34 =15:00 -2:00
              and
                 play infile trim 12:34 2:26 -2:00
              will both play from 12 minutes 34 seconds into the audio up to 15 minutes into the audio (i.e. 2
              minutes and 26 seconds long), then resume playing two minutes before the end of audio.
antoniomo commented 7 years ago

I'm gonna close this issue for now, let me know if you need something else and we can reopen it :)