facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.53k stars 1.02k forks source link

Silent Parts of the Audio #287

Open m-pektas opened 6 months ago

m-pektas commented 6 months ago

First of all, thanks for sharing this great work as open source.

When I use seamless m4t with 15 sec audio, the translated version's length is 5 sec. The silent parts are removed from the audio but I want to perform this translation process while keeping the length of the original audio. Do you know how I can do that?

avidale commented 3 months ago

Hi! One potential solution would be the following:

  1. Detect the silence and voice in the source audio using some external voice activity detection model.
  2. Split the source audio into the voice-only and silence-only segments
  3. Translate the voice-only segments with Seamless
  4. Concatenate the silence segments with the translated segments in the right order, to get the right duration.
m-pektas commented 3 months ago

Hi, @avidale. Thanks for your answer. I solved the problem completely same approach. But there was another issue here. The length of the translated voice could be different sometimes. If the translated length is shorter the solution is simple we need to add extra silence to silent parts. But if the translated length is taller, unfortunately, we need a new solution :)