Closed dlion168 closed 6 months ago
Those look like the start index and end index of the segment
Those are not index of the segment, I have treated it as index of the segment and segment the audio, but the outputs are too short. Take the following instance as an example http://archives.kfuo.org/mp3/TSW/TSW_Jul_01_2019.mp3 1781664 1786782
, the difference between numbers is 1786782-1781664=5118
, so it is only 5118/16000=0.32
second. The time interval is too short for a full spoken sentence.
Maybe @avidale can help?
These numbers are in milliseconds. Thus, the example above is 5.118 seconds long, and this is just enough to pronounce the phrase indicated in the doc as the transcript: their sword shall enter their own heart, and their bows shall be broken.
.
Thanks for the reply. It solves my question.
Hi, Thank you for your great work. After I downloaded the audio from the links in mutox.tsv, I would like to know how can I segment the audio to only contain the hate speech segment. I found two numbers in the same column with the URL, such as
http://archives.kfuo.org/mp3/TSW/TSW_Jul_01_2019.mp3 1781664 1786782
but I cannot figure out the meaning of the numbers "1781664 1786782". Can you please explain more about these two numbers?