<podcast:chapters> should accept webvtt as recommended

dascritch commented 2 years ago

In https://github.com/Podcastindex-org/podcast-namespace/blob/main/docs/1.0.md#user-content-chapters , for the tag <podcast:chapters>, it states that browsers aren't supporting ID3 chapters tags. This is true. But instead the proposal use a less standard solution, a brand new json file.

Support for WebVTT files is nearly complete on web browsers and are a W3C standard.](https://www.w3.org/TR/webvtt1/), works smoothly in 99% market-share browsers, are exposed and used in accessibility tools. So why preconize a new file format without native implementation instead to use one perfectly used now for 10 years (in subtitling, but it works perfectly too for chaptering, i'm using it) ?

I suggest to change this to recommend WebVTT as a preferred solution, mime type text/vtt, documentation https://www.w3.org/TR/webvtt1/, and alternatively to suggest application/json+chapters.

saerdnaer commented 2 years ago

How would https://github.com/Podcastindex-org/podcast-namespace/blob/main/chapters/jsonChapters.md#more-complex-example look like in webvtt?

dascritch commented 2 years ago

WebVTT needs endTime, as it is mandatory to fire events. I think this file must be splited in two different files : one with chapters declaration, the other with metadata, even if advanced notation accept to include some elements (I'm not using it, I'm keeping the simplier notation).

Each WebVTT track file, once loaded, will fire entering and leaving each cue point. So chapters and metadata will not have same called functions.

Here is my version :

WEBVTT FILE

chapter-1
00:00:00.000 --> 00:02:48.000
Intro

chapter-2
00:02:48.000 --> 00:04:20.000
Hearing Aids

chapter-3
00:04:20.000 --> 00:06:50.000
Progress Report

chapter-4
00:06:50.000 --> 01:06:30.000
Namespace

chapter-5
01:06:30.000 --> 01:31:50.000
Just Break Up

chapter-6
01:31:50.000 --> 01:37:34.000
The Big Players

chapter-7
01:37:34.000 --> 01:41:29.000
Spread the Word

chapter-8
01:41:29.000 --> 00:02:00.000
Outro

WebVTT files looks like SRT files, and have the same accepted html tags.

Some examples are in that MDN page https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API

Also, you may check WebVTT type metadata notation format

saerdnaer commented 2 years ago

Why should less features (no urls, no chapter level, no images) be the default/recommendation? For transcripts there are more mime type examples, but I don't think that the specs maintainer will change that for chapters... Then you would also have to mention the other ways how to add chapters, and clients would have to add another way to parse them etc.

Also be aware that the term chapter is really misleading, c.f. https://github.com/Podcastindex-org/podcast-namespace/issues/103 and https://github.com/Podcastindex-org/podcast-namespace/issues/104

dascritch commented 2 years ago

There is a transcript kind of <track>, aka captions (a bit different than subtitles) with better capabilities than chapters.

To be honest : I think that what you describe, aka transcriptions with hyperlinks are a bit more usable and accessible put in the <content> tag of your feed, not in the chaptering module. Try it with a blind approach, and you will easily understand that it is over problematic in a11y point of view. I think this is the main reason that chapters have simplified structure. Standards like W3C ones exists because a lot, really a lot of people where scratching their heads on a lot of problems to co-op with.

Oh, you can set not one but a lot more of <track kind="chapters"> in an <audio/video> tag, this is perfectly supported by browsers, even if it lacks a visible element (what i'm doing with a home-cooked web component). and each WebVTT track may be named : "musics", "segment", "fine chapter", etc....

metadata was created because you need to completely create the IHM component for it, instead of a subtitles/captions role. Note that the examples use JSON structure, easy to comply with within the browser for each cue event. And so easy to define extensible elements as {"images":["url1", "url2"], links:["url3"]}.

dellagustin commented 2 years ago

I feel divided. Using web standards understood by the browser are compelling. On the other hand, many aggregators do no run on a browser, and will be an additional effort to parse the WebVTT format, specially if we need two files, one for chapters and one for metadata. The current format is very easy to manually author and also to parse.

jamescridland commented 2 years ago

I don't have a dog in this race.

But - it doesn't look complicated (nor a processing burden) to take SRT and produce a WebVTT version if required programmatically.

When I fiddled with this a while ago, the CORS issues with using SRTs hosted on other domains meant that you needed a local proxy anyway.

saerdnaer commented 2 years ago

Yes, if one wants to use subtitle files we should definitely recommend WebVTT over SRT, and maybe even drop SRT files from the spec.

As far as I have understood it you simply add a WEBVTT line in the header of your SRT file ~~and you have valid WebVTT.~~ Update: Ah, you also have to replace the milli second seperator from comma to point so SRT 00:02:48,000 should be converted to WebVTT 00:02:48.000 :-( – c.f. https://en.wikipedia.org/wiki/WebVTT#Main_differences_from_SubRip

You might also add some metadata like language and kind, but they are optional:

https://jbilocalization.com/the-difference-between-srt-and-webvtt-in-captioning-subtitling/

dascritch commented 2 years ago

The cited article miss the fact that you can name sequences/chapters in WebVTT, and this is recommended in events checking and pointing to a chapter instead of a timecode.

saerdnaer commented 2 years ago

Yeah, the examples in the spec are really interesting, especially

Someone™ should create a WebVTT version of https://github.com/Podcastindex-org/podcast-namespace/blob/main/chapters/exampleComplex.json – and add it as an example via PR.

The open questions are:

does every cue also need an end time (okay, you could use the start time of the next chapter)
how to deal with chapter images... (I personally still would prefer to have one file with all chapter images, instead only referencing them via an URL)
should the structure/chapters also be included in a transcript .vtt file, or should they be always in separate files.

dascritch commented 2 years ago

Someone™ should create a WebVTT version of...

Haven't I done it upper ? https://github.com/Podcastindex-org/podcast-namespace/issues/315#issuecomment-987717696 I didn't push it in a commit because I don't know exactly what are contribution politics there ;)

About images : You may use embedded CSS background-image declaration, but I really think that a metadata track is more suitable, and so about hyperlinks. We need a very good a11y expertise there to not make disabilities tools confused. But may be we can push a proposal to W3C. The cost of an open HTTP/2 socket is really minimal, and if you use name-chaptering, create an authoring tool will be really easier to maintain.

dellagustin commented 2 years ago

How would the the podcast:chapters look like if we use webvtt? We would need two url attributes, one for chapters and one for metadata. From an app developer perspective, handling with the two files has an additional burden: The metadata timecodes are not necessarily aligned with the chapter timecodes (ideally they are, but not necessarily) - in case of misalignment, the client has to decide what and how to show. With the json chapter format, this is not a problem. For captions/transcripts, I think that webvtt looks like the way to go, but for chapters, I'm still failing to see the advantage. Similar to what was mentioned by @jamescridland for transcripts, it would be easy to produce a webvtt file format from the chapters format.

dascritch commented 2 years ago

Hi @dellagustin , you've got some good points.

But please notice that before building podcast for your applications, podcast editors also have website, and WebVTT is the only implemented format, and perfectly supported in any actual browser, even Safari. Everything from track selection, events,etc… a very complete API is available and respond very well.

For Images or external URL, you can use the NOTE feature for this, as this is only comment for WebVTT developers. So, as we use comment as decorators in some languages as python, php, and so on, we can extend the NOTE below any chapter to include a free JSON metadata.

I'm only suggesting it, but we can as to an editor of the spec for being sure.

saerdnaer commented 2 years ago

Why an extra NOTE? Isn't it possible to combine metadata Cue's with text Cue's?

https://www.w3.org/TR/webvtt/#introduction-metadata

1
00:00:00.100 --> 00:00:07.342
Samurai Pizza Cats
{
 "type": "WikipediaPage",
 "url": "https://en.wikipedia.org/wiki/Samurai_Pizza_Cats"
}

3
00:11.441 --> 00:14.441
Foo bar Location
{
 "type": "LongLat",
 "lat" : "36.198269",
 "long": "137.2315355"
}

Haven't I done it upper ? #315 (comment) I didn't push it in a commit because I don't know exactly what are contribution politics there ;)

No, this was only a comment – And I meant a full example, with the same content as the linked file but in different representation.

dascritch commented 2 years ago

Is your notation proposal conforms to WebVTT standard description ?

As I read your proposal, any browser will take the whole text until a full blank line as a text. Rendering in any a11y service/device will be catastrophic. You cannot break things like that in any standardized function, especially when the impacted spart of the standard is about accessibility. This is a complete no go,

Your proposal, as your JSON is non-normative, will be better suited in a separate block, noted as NOTE :

1
00:00:00.100 --> 00:00:07.342
Samurai Pizza Cats

NOTE
{
 "type": "WikipediaPage",
 "url": "https://en.wikipedia.org/wiki/Samurai_Pizza_Cats"
}

3
00:11.441 --> 00:14.441
Foo bar Location

NOTE
{
 "type": "LongLat",
 "lat" : "36.198269",
 "long": "137.2315355"
}

Note that I didn't check how web browsers will interpret it. I suppose they will ignore it, so you cannot use it in any javascript lib. It's better to separate properly data, as we use to separate HTML, CSS, JS and MP3 in different files.

dellagustin commented 2 years ago

Regarding using NOTE to express metadata, the section 4.1. WebVTT file structure states A WebVTT comment block is ignored by the parser, so I don't expect it would me made available using parsing APIs. I'd say it is a no-go for this option.

But please notice that before building podcast for your applications, podcast editors also have website, and WebVTT is the only implemented format, and perfectly supported in any actual browser, even Safari. Everything from track selection, events,etc… a very complete API is available and respond very well.

A few comments on that:

I could not find a source for this info, but I think most podcast listening is done with apps rather than directly at the podcasts website, but I might be wrong
As far as I could tell, browsers do not do much at the moment with the chapter track except exposing it through the Javascript API, so that websites still have to build stuff around to show and interact with the chapters
I assume most authoring of chapters is not done directly in the WebVTT format, but that it is generated using tools, which could potentially generate the WebVTTs and our JSON format
I still think that the fact that the WebVTT chapters track and metadata track do not have an enforced 1:1 relationship on its cues could present a UX issue for apps that rely on this strong relationship

That said, I still feel divided.

PofMagicfingers commented 2 years ago

WebVTT is interesting for chapters, but there is nothing in the spec allowing to merge files with differents purpose. It's actually the opposite. When used in HTML, you have to specify the kind of data the webvtt file contains :

kind How the text track is meant to be used. If omitted the default kind is subtitles. If the attribute contains an invalid value, it will use metadata (Versions of Chrome earlier than 52 treated an invalid value as subtitles). The following keywords are allowed:

subtitles Subtitles provide translation of content that cannot be understood by the viewer. For example speech or text that is not English in an English language film. Subtitles may contain additional content, usually extra background information. For example the text at the beginning of the Star Wars films, or the date, time, and location of a scene.

captions Closed captions provide a transcription and possibly a translation of audio. It may include important non-verbal information such as music cues or sound effects. It may indicate the cue's source (e.g. music, text, character). Suitable for users who are deaf or when the sound is muted.

descriptions Textual description of the video content. Suitable for users who are blind or where the video cannot be seen.

chapters Chapter titles are intended to be used when the user is navigating the media resource.

metadata Tracks used by scripts. Not visible to the user.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/track

I agree with @dascritch, we should use existing web standards when possible. Even if people are listening to your content inside apps, ensuring ease of use and compatibility with web technologies is still a good point.

Although @dellagustin is absolutly right, NOTE inside webvtt are ignored by parsers and should not be used.

Here is an exemple of my usage of VTT in a recent project. I have 2 files, subtitles.vtt with my subtitles :

WEBVTT

00:00:00.135 --> 00:00:03.108
Bonjour et bienvenue dans podcast.tips,
le podcast podCloud qui donne toutes les

00:00:03.108 --> 00:00:04.722
petites astuces pour
mieux faire du podcast

And another one data.vtt with metadata :

WEBVTT

episode-title
00:00:00.000 --> 00:10:00.000
Choisir son micro

logo-placement
00:00:00.000 --> 00:00:34.000
top right

logo-placement
00:00:34.000 --> 00:00:38.500
top left

logo-placement
00:00:38.500 --> 00:10:00.000
top right

If we use WebVTT we SHOULD use separate files for separate content kind as specified in the spec of webvtt.

Here is a guide on how to use chapters vtt : http://thenewcode.com/977/Create-Interactive-HTML5-Video-with-WebVTT-Chapters

felixfbecker commented 7 months ago

I know this is an old thread, so apologies for being late to the party. But I was also very surprised reading the spec that chapters (and captions) seem to prefer a custom JSON format over the existing open standard WebVTT.

The WebVTT standard is very extensive and handles/specifies a lot of details that the custom JSON formats gloss over, like how non-English text would work (text encoding, right-to-left languages, mixed languages, ...).

While it's true that no browser today renders chapter tracks in <video>/<audio> elements with the native controls attribute, those tracks still get loaded by the browser and exposed through the text track/cue API that is time-synced to the media. Third-party player control UIs still make use of those APIs under the hood and rely on WebVTT to provide chapters. E.g. video.js (maybe the most popular video player) supports chapters, see this demo which has a chapter selector and if you open the devtools network tab you can see it loads this chapters.en.vtt file.

Even if the player is not browser-based, there are existing libraries in other programming languages for handling WebVTT and tools that export WebVTT (e.g. if you download a Zoom recording, it will give you the transcript as a .vtt).

If the goal is to allow supplying chapter art and links too, it makes a lot of sense to me to have one VTT of kind chapters containing just the text and one of kind metadata with supplemental JSON properties like img and url. This way there is progressive enhancement; any player that only handles text chapters can use only the simple chapters VTT, but any player that can display images and URLs can additionally use the metadata (which will be conveniently exposed through the same API in the browser). The spec could define that the timestamps between the chapters and metadata must be identical, which would be natural anyway if those are generated programmatically by a tool.

Podcastindex-org / podcast-namespace

<podcast:chapters> should accept webvtt as recommended #315