Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
104 stars 71 forks source link

Add Captions, Subtitles, and Interactive Transcripts #1003

Open DonRichards opened 5 years ago

DonRichards commented 5 years ago

Obvious suggestion is to add video closed captioning, subtitles, and interactive transcripts support for video playback.

The question is what closed captioning formats to support and which should be the default if any?

And should there be an example in the vagrant build?

Most popular from what I see is either SRT or WebVTT.

WebVTT Supported by Video.js Has the ability to use captioning numbers and metadata (embedded in the VTT file) WebVTT has the ability to specify font, color and text formatting, and placement.

SRT No longer supported by Video.js SRT support basic text formats (bold, italic, underline) and placement. linguists often prefer to translate directly in SRT since it will have fewer text code elements Compatible with most subtitles processing programs.

Some other formats to note


CAP – This is a common subtitle/caption file format for broadcast media. It was developed by Cheetah International. CPT.XML – XML format used for encoding captions into Flash video. It originated in the caption-embedding software Captionate. DFXP – This is the most common format used for captioning Flash video. It’s a timed-text format that was developed by W3C and stands for “Distribution Format Exchange Profile.” EBU.STL – This is a common subtitle/caption file format for PAL broadcast media. It was developed by the European Broadcast Union. QT – Caption format used for QuickTime video or audio. It was developed by Apple. RT – RealText captions for RealMedia video or audio. SAMI (SMI) – Used for Windows Media video or audio. It was developed by Microsoft and stands for “Synchronized Accessible Media Interchange.” SBV – This is a YouTube caption file format that stands for “SubViewer.” It’s what you get when you download captions from YouTube. It’s a text format that is very similar to SRT. SCC – Popular standard used for Line 21 broadcast closed captions, web media, DVD, as well as subtitles for iTunes, iPods, iPads, and iPhones. It was originally developed by Sonic and stands for “Scenarist Closed Caption.” SRT – This is the most common subtitle/caption file format, especially for YouTube or Facebook captioning. It is a text format that originated in the DVD-ripping software SubRip and stands for “SubRip Subtitle” file. STL – Used for DVD Studio Pro. It was developed by Spruce Technologies and known as “Spruce Subtitle File.” WebVTT – Caption format for HTML5 media players. ITT – iTunes Timed Text WMP.TXT – Windows Media ADBE – Adobe

Natkeeran commented 5 years ago

@DonRichards

This is of interest to local use cases, specially with respect to accessibility and oral histories. WebVTT seems to be the standard supported by W3C: https://www.w3.org/TR/webvtt1/.

If we agree on WebVTT, then we can consider extending this: https://www.drupal.org/project/videojs. We would need to add tracks as media, then the field formatter plugin needs to access those tracks.

Natkeeran commented 5 years ago

@DonRichards @dannylamb @MarcusBarnes

I have done some initial work to see how we can include caption tracks in the video js player.
https://github.com/Natkeeran/islandora/tree/videojs_overrides

It uses videojs module and overrides its theme hook to add the track. videojs drupal module is designed to support that use case. We don't have to touch the module.

Testing

00:00.000 --> 00:30.000

This is the transcript text content. 00:30.000 --> 01:00.000 speaker 2 transcript 01:00.000 --> 01:45.000 speaker 3 transcript ``` * Get the path and modify it here https://github.com/Natkeeran/islandora/blob/videojs_overrides/modules/islandora_videojs/templates/videojs.html.twig#L21 * You should see the captions when you go the video (You will need to clear the cache to see changes effect during various steps.) ## Questions * Is this ([videojs](https://www.drupal.org/project/videojs)) the drupal module we want to use to bring in videojs. It does not seem to provide an option to use local video.js library. * What is the best way to add the logic to pull in vtt track files associated with a node before passing that info to the twig template. We will also need the language info/codes. * Should it be a sub module on its own, or should the logic go some where else?
dannylamb commented 5 years ago

daaaaaaaaaaaang @Natkeeran

DonRichards commented 5 years ago

@Natkeeran To answer your questions,

1) The Videojs library "should" be able to handle the vtt files but I'm not completely sure. I guess if we're including an older version of videojs it might be an issue.

2) I think a simple standardized vtt file would work, treating the naming convention something like thumbnails. Something like "captions.vtt" would be good and easy to understand at a glance.

3) I don't think vtt integration should be a seperate submodule, it might give the wrong impression (that accessibility isn't integrated).

DonRichards commented 5 years ago

I may have misunderstood your first question. I'm not completely sure on your other questions.

DonRichards commented 4 years ago

I think this might cover the transcripts viewer & editor as another solution https://www.drupal.org/project/transcript

elizoller commented 3 years ago

ok so related to the tech call from 1/13/21 and my recent PR: https://github.com/Islandora/islandora_defaults/pull/44 we decided to check out a drupal core issue + patch: https://www.drupal.org/files/issues/2019-10-07/3056714-16.patch and https://www.drupal.org/project/drupal/issues/3056714

this was fairly straight forward to apply

  1. i applied the patch via composer
  2. added a new field to the video media type - the new field type is media track
  3. copied the template from the patch core/themes/class/templates/field/file-video.html.twig into the place in bartik that was rendering the template core/themes/bartik/templates/classy/field/file-video.html.twig
  4. added a video node and related media.
  5. waited for service file to be generated
  6. attached vtt in the media track field and specified language
  7. went back to view node
  8. voila captions

A few notes:

  1. not sure why i had to copy the template over - the patch should probably apply to all the core shipped themes?
  2. i believe it was @rosiel who said that the languages would then be dependent on the enabled languages in the drupal site and that is 100% the case. which in my opinion, isn't a good thing. its highly likely we'd have captions for more languages than we make our site fully translatable into.
matiaszumbo commented 3 years ago

subtitles in EBU-STL format Any puglin for videojs?

seth-shaw-unlv commented 3 years ago

@matiaszumbo,

There was an earlier work-in-progress PR that used videojs, but it was abandoned. You could possibly change your theme to override the video field Twig to include the requisite libraries and use the video-js tag instead of the video tag, but I haven't tried it myself.

Also, according to their docs, videojs only supports WebVTT. I know YouTube will take them, but I haven't found any indication that browsers can. Perhaps you should try converting them?

matiaszumbo commented 3 years ago

@seth-shaw-unlv Yes, maybe is a better option convert the .stl files to .vtt.

Digital-Grinnell commented 3 years ago

We have a number of oral histories that are audio-only, no video. When available we display a static image of the speaker in the display with caption text reflecting the transcription. Can we assume that such capability would be applicable to audio recordings as well?