Open jihwanp opened 3 years ago
What about if you remove the redundant overlapping part of the concatenated clips?
@jihwanp I noticed this as well. The textual overlap is in the ASR subtitles from Youtube already. However, that actually depends on the subtitle format you download from Youtube. VTT has the issue while the TTML subtitles look more clean. Note: The overlap is still there but only in the timestamps. The text itself is clean.
I guess this is due to limitations of the subtitle format to achieve the desired effect on screen, as those formats are meant for visualization and not data storage.
Hope this helps.
Example (video qREX695vxKs): VTT:
00:00:04.400 --> 00:00:06.309 align:start position:0%
hey there real woman of philadelphia and
00:00:06.309 --> 00:00:06.319 align:start position:0%
hey there real woman of philadelphia and
00:00:06.319 --> 00:00:06.950 align:start position:0%
hey there real woman of philadelphia and
paula dean
00:00:06.950 --> 00:00:06.960 align:start position:0%
paula dean
00:00:06.960 --> 00:00:09.030 align:start position:0%
paula dean
my name is peyton kaminski and i'm about
TTML:
<p begin="00:00:04.400" end="00:00:06.960" style="s2">hey there real woman of philadelphia and</p>
<p begin="00:00:06.319" end="00:00:09.040" style="s2">paula dean</p>
<p begin="00:00:06.960" end="00:00:10.400" style="s2">my name is peyton kaminski and i'm about</p>
<p begin="00:00:09.040" end="00:00:12.080" style="s2">to be moving to florida in a couple</p>
I noticed that the caption csv file seems to have many overlaps for each clip. I want to see the long range pair, but I think it'll be a problem if I just concatenate them. Is there any way to get a video-caption pair? Do I have to use ASR on my own? By the way, thanks for sharing nice work.