Closed angrave closed 2 years ago
(in the future we will want to support speaker name too!)
I'd be happy to work on this. Here is my proposed spec:
POST: /api/Captions Parameters: captionFile (either .vtt or .srt file) and videoId (the ID of video that the captions are for) Auth: Admins, TAs, and instructors
The endpoint will parse the closed caption file (either .vtt or .srt file) and create a new Transcription object with the list of captions from the file. This new transcription object will then be added to the corresponding video's list of transcriptions.
Here is the proposed method signature:
[DisableRequestSizeLimit]
[Authorize(Roles = Globals.ROLE_ADMIN + "," + Globals.ROLE_TEACHING_ASSISTANT + "," + Globals.ROLE_INSTRUCTOR)]
[HttpPost]
[Consumes("multipart/form-data")]
public async Task<ActionResult<IEnumerable<Caption>>> PostCaptionFile(IFormFile captionFile, [FromForm] string videoId)
{
// TODO
}
It looks like we currently have an unused WebVTT parser function here. I propose that we use this Nuget package instead, which has over 100k downloads from Nuget. It would also allow us to immediately add support for other types of subtitle files (.sub, .ssa, .ttml).
@angrave how does this look to you?
Yes. We should extend the transcription to include a Label
And if the transcription should be published or not. e.g. there could be transcriptions from Azure, Kaltura/Youtube and manual upload
Would you like only a "Label" field of type string? Or would you like a "PublishStatus" enum field (the same one we already have for Offerings, Playlists, EPub, etc) along with a "SourceType" enum field (the same one we already have for Playlist, Media, EPub, etc) instead of a "Label" string field?
We can use caption table , vtt files etc etc to represent text description of a scene instead of captions (which can then be rendered in a screenreader or with html5 text to speech). So maybe - TranscriptionEnumType Caption|TextDescription
I am not sure I follow what you mean in that last paragraph about caption table vs captions and TranscriptionEnumType. Could you explain a little bit more about what you are proposing?
We want to be able to have more textual descriptions to support users with low vision
A transcription object is no longer a transcription of audio, rather it can also be a set of text items that start end times
Eg the text description of a scene-
16:00-17:00 "The Presenter, approximately 40 years old, stands in front of a desk" 17:00-17:55 "a table of enrollment data from 1650 to 2020, showing a slow upward trend. The largest enrollment was in 1965"
If we don't support this with the current caption and transcription table/api we will have to duplicate a lot of logic in the frontend and backend eg versioning of edits.
In the front end we will want to make the editable, downloadable and render them as text for screen reader and audio.
What should the default value for the Editable
enum be? I am guessing that it would be None
but I just want to double check. Similarly, should the default value for the TranscriptionType
enum be Caption
?
Also just so you know, we already have a Description
field for the Transcription object.
https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API Captions in ClassTranscribe only support a start and end time and text; so discard the other information for now
https://jbilocalization.com/blog/the-difference-between-srt-and-webvtt-in-captioning-subtitling/