algolia / youtube-captions-scraper

Fetch youtube user submitted or fallback to auto-generated captions
245 stars 67 forks source link

The wrong "dur" #6

Open coolswood opened 5 years ago

coolswood commented 5 years ago

I've tried to get subtitles for this video 9rWjb7t8cfo, and I get this

{ start: '2.72', dur: '4.48', text: 'sorry but low-fat foods don\'t help you' }, { start: '5.7', dur: '3.3', text: 'lose weight and the only reason you' }, { start: '7.2', dur: '4.5', text: 'think they do is because of bad science' }, { start: '9', dur: '5.04', text: 'and worst marketing that\'s ridiculous' }, ...

'Start' is start is absolutely right, but, as you can see, 'dur' is not. Plese, help, mb I do smt wrong?

Haroenv commented 5 years ago

Is this different than the times found in youtube?

coolswood commented 5 years ago

@Haroenv No, the duration of auto-generated subtitles is probably not displayed correctly. I think you should add a marker saying that these subtitles are generated automatically. Because in this case," dur " is useless.

mo3rfan commented 2 years ago

@coolswood I believe the dur in the case of ASR subtitles is how long (in seconds) the caption will be visible on the youtube player (even if the next caption has started to appear, the previous caption will usually linger a little longer).

mo3rfan commented 2 years ago

That being said, I'm curious how youtube does word by word timing for the ASR captions.