facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.94k stars 1.06k forks source link

How to segment hate speech downloaded from the Mutox dataset tsv file #428

Closed dlion168 closed 6 months ago

dlion168 commented 7 months ago

Hi, Thank you for your great work. After I downloaded the audio from the links in mutox.tsv, I would like to know how can I segment the audio to only contain the hate speech segment. I found two numbers in the same column with the URL, such as http://archives.kfuo.org/mp3/TSW/TSW_Jul_01_2019.mp3 1781664 1786782 but I cannot figure out the meaning of the numbers "1781664 1786782". Can you please explain more about these two numbers?

zrthxn commented 7 months ago

Those look like the start index and end index of the segment

dlion168 commented 7 months ago

Those are not index of the segment, I have treated it as index of the segment and segment the audio, but the outputs are too short. Take the following instance as an example http://archives.kfuo.org/mp3/TSW/TSW_Jul_01_2019.mp3 1781664 1786782, the difference between numbers is 1786782-1781664=5118, so it is only 5118/16000=0.32 second. The time interval is too short for a full spoken sentence.

zrthxn commented 6 months ago

Maybe @avidale can help?

avidale commented 6 months ago

These numbers are in milliseconds. Thus, the example above is 5.118 seconds long, and this is just enough to pronounce the phrase indicated in the doc as the transcript: their sword shall enter their own heart, and their bows shall be broken..

dlion168 commented 6 months ago

Thanks for the reply. It solves my question.