IIIF / iiif-av

The International Image Interoperability Framework (IIIF) Audio/Visual (A/V) Technical Specification Group aims to extend to A/V the benefits of interoperability and the growing ecosystem of clients and servers that IIIF provides for images. This repository contains user stories and mockups for interoperable A/V content – contributions are welcome.
http://iiif.io/community/groups/av/
Apache License 2.0
13 stars 3 forks source link

determine text track kind #63

Open jronallo opened 7 years ago

jronallo commented 7 years ago

Description

Either WebVTT files or text annotations could be meant as captions, subtitles, or descriptions. How can this be made explicit in Presentation?

Variation(s)

The possible kinds an HTML5 track supports include subtitles, captions, descriptions, chapters, and metadata.

Proposed Solutions

Would it be possible to add a "kind" that maps to the HTML5 track element attribute?

Additional Background

This is related to my work on video encoding that also outputs a canvas. See https://github.com/jronallo/abrizer/issues/5 and https://github.com/jronallo/abrizer/issues/6

azaroth42 commented 7 years ago

I think this is one step too deep for Presentation. It's the same distinction as a diplomatic transcription versus an edition of a manuscript -- important from a semantic perspective, but less so from a presentation perspective. The only distinction is subtitles vs captions, which as we heard last week, is a North American concern not shared even in the UK. The other three map to metadata properties, not annotations.

zimeon commented 7 years ago

I agree with @azaroth42 that semantic indication is a metadata issue and (aside from linking) outside the scope of prezi. Presumably if the use case is to show the choice of transcripts to a user then that would be simple a Choice with appropriate labels following the pattern of the language choice in https://github.com/IIIF/iiif-av/blob/master/source/api/av/examples/11a.json (no choiceHint client) ?

jronallo commented 7 years ago

If you're delivering the content through a track element then how do you select the kind of captions or subtitles to use for the track? They do have different definitions and use cases as far as the HTML5 specification goes. Subtitles are for the case where sound is available and heard, but the language of the dialogue is not understood. Captions include more than just dialogue and are meant for when sound is not available. These definitions go beyond the distinction between same language subtitling and translation that is often the distinction made between the words "captions" and "subtitles." Captions will include text for the crash that happens off-screen, while subtitles whether the same language as the dialogue or not do not have the same intention to fill in those accessibility gaps for someone without access to the sound.

https://www.w3.org/TR/html5/embedded-content-0.html#the-track-element

A "description" track is not a metadata description but a textual description of the video's visual content that is converted into an audio track for those who would benefit from it. With the speech synthesis API gaining greater adoption it is now possible to deliver this kind of content as a lower cost way for audio description than developing an additional audio track. It would be nice to encourage this kind of accessibility more. Is there any way to give a motivation to say that a text annotation should be "spoken" rather than painted? http://caniuse.com/#feat=speech-synthesis

The other two kinds (chapters and metadata) could be handled in different ways in IIIF, but it would be nice to have examples of how to get similar functionality out of IIIF. It seems like chapters and the like ought to be rather common, though it might get into the same kind of granularity discussion that the newspaper group is having. There are chapters that are useful for higher level navigation, but then other annotations could be at a scene-by-scene level which in a film with lots of quick cuts could involve lots of annotations that have a different purpose than chapter-level navigation.

Metadata tracks have had different uses, but a common one has been to show video preview thumbnails over the time rail. How would you accomplish preview thumbnails in IIIF? https://support.jwplayer.com/customer/portal/articles/1407439-adding-preview-thumbnails

BTW, part of the advantage of using text tracks for this kind of functionality is the ability to work off of the JavaScript API and built in video timers. Even if the metadata is managed as annotations inserting those into a track could have advantages for display. So a metadata track for instance would not display any controls by default, but then could take advantage of triggering events off of timers. But that is more an implementation piece than a reason to define "kind" for IIIF use. Still if these tracks get used for their timers, it would be useful to know how to map different sets of annotations to different track kinds as they are expected to behave differently.