Updating the specs for WebVTT

Podcastindex-org / podcast-namespace

A wholistic rss namespace for podcasting

Creative Commons Zero v1.0 Universal

371 stars 111 forks source link

Updating the specs for WebVTT #624

Closed tomrossi7 closed 2 months ago

tomrossi7 commented 3 months ago

Buzzsprout will be rolling out our support for VTT and I wanted to make sure the specs were up-to-date. I don't believe there are any character limits. I can't find any rhyme or reason to how Apple is doing it in their transcripts (other than speaker changes).

@jamescridland does this look right?

theDanielJLewis commented 3 months ago

FYI, it seems Apple's downloadable transcripts break at sentences instead of shorter segments. I'm surprised they don't even break at punctuation to help shorten lines from long sentences.

Maybe they've studied and found it's actually easier to read the transcript—if the full transcript is shown—when a full sentence is highlighted at a time instead of smaller sections. That's certainly easier than word-by-word highlighting (which looks cool but isn't actually very readable).

tomrossi7 commented 3 months ago

FYI, it seems Apple's downloadable transcripts break at sentences instead of shorter segments. I'm surprised they don't even break at punctuation to help shorten lines from long sentences.

Yeah, the standard doesn't really provide any guidelines for how long a caption can be. I would think shorter segments are better since you get a higher fidelity on your timestamps.

Maybe they've studied and found it's actually easier to read the transcript—if the full transcript is shown—when a full sentence is highlighted at a time instead of smaller sections. That's certainly easier than word-by-word highlighting (which looks cool but isn't actually very readable).

I thought they were doing word-by-word highlighting. I wonder if they use the VTT as input and then have another algorithm for the highlighting?

theDanielJLewis commented 3 months ago

I thought they were doing word-by-word highlighting. I wonder if they use the VTT as input and then have another algorithm for the highlighting?

Oh, you're right. They do highlight word-by-word.

jamescridland commented 3 months ago

So far as I can see, they do highlighting (? on the device ?) word by word.

The algorithm means that if you write "BUY BITCOIN TODAY!!" in a transcript, it only ever shows up in the transcript if someone says something approximating "BUY BITCOIN TODAY!!" - otherwise it simply doesn't show.

Yesterday's Podnews didn't use my VTT for some reason, and used its own version instead.

tomrossi7 commented 2 months ago

@jamescridland its interesting to see how much processing they are doing even if they've been supplied a VTT! You're probably right that it avoids spammy content that isn't really in the audio at all.

We will be putting VTT's in the RSS feed soon, but you can already see them by going directly to the URL. Here is your latest:

https://www.buzzsprout.com/1538779/14790784/transcript.vtt