classtranscribe / WebAPI

Repository for the .NET Core backend for ClassTranscribe
Other
14 stars 3 forks source link

Upload closed caption files e.g. srt #154

Closed angrave closed 2 years ago

angrave commented 3 years ago

https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API Captions in ClassTranscribe only support a start and end time and text; so discard the other information for now

https://jbilocalization.com/blog/the-difference-between-srt-and-webvtt-in-captioning-subtitling/

angrave commented 3 years ago

(in the future we will want to support speaker name too!)

ghost commented 2 years ago

I'd be happy to work on this. Here is my proposed spec:

POST: /api/Captions Parameters: captionFile (either .vtt or .srt file) and videoId (the ID of video that the captions are for) Auth: Admins, TAs, and instructors

The endpoint will parse the closed caption file (either .vtt or .srt file) and create a new Transcription object with the list of captions from the file. This new transcription object will then be added to the corresponding video's list of transcriptions.

Here is the proposed method signature:

[DisableRequestSizeLimit]
[Authorize(Roles = Globals.ROLE_ADMIN + "," + Globals.ROLE_TEACHING_ASSISTANT + "," + Globals.ROLE_INSTRUCTOR)]
[HttpPost]
[Consumes("multipart/form-data")]
public async Task<ActionResult<IEnumerable<Caption>>> PostCaptionFile(IFormFile captionFile, [FromForm] string videoId)
{
    // TODO
}

It looks like we currently have an unused WebVTT parser function here. I propose that we use this Nuget package instead, which has over 100k downloads from Nuget. It would also allow us to immediately add support for other types of subtitle files (.sub, .ssa, .ttml).

@angrave how does this look to you?

angrave commented 2 years ago

Yes. We should extend the transcription to include a Label

angrave commented 2 years ago

And if the transcription should be published or not. e.g. there could be transcriptions from Azure, Kaltura/Youtube and manual upload

ghost commented 2 years ago

Would you like only a "Label" field of type string? Or would you like a "PublishStatus" enum field (the same one we already have for Offerings, Playlists, EPub, etc) along with a "SourceType" enum field (the same one we already have for Playlist, Media, EPub, etc) instead of a "Label" string field?

angrave commented 2 years ago

Propose the following- yes, the PublishStatus would be useful. e.g. we might want to not show the Kaltura captions if the Azure captions are better. SourceLabel (string) presentable description to viewers about the source (e.g. "ClassTranscribe" "Uploaded by DRES" "Original" "LiveCaptioned" SourceInternalRef (string) (so that we can find and replace a transcription when the upstream is modified). e.g. Kaltura/Italian/33 Editable (enum)- None | Limited | Suggest | CrowdSource (limited=course staff owned; suggest = editable but edits will not be live to others) Label - Short Visual label displayed to users in a drop down Description - Longer description of the transcription object

We can use caption table , vtt files etc etc to represent text description of a scene instead of captions (which can then be rendered in a screenreader or with html5 text to speech). So maybe - TranscriptionEnumType Caption|TextDescription

ghost commented 2 years ago

I am not sure I follow what you mean in that last paragraph about caption table vs captions and TranscriptionEnumType. Could you explain a little bit more about what you are proposing?

angrave commented 2 years ago

We want to be able to have more textual descriptions to support users with low vision

A transcription object is no longer a transcription of audio, rather it can also be a set of text items that start end times

Eg the text description of a scene-

16:00-17:00 "The Presenter, approximately 40 years old, stands in front of a desk" 17:00-17:55 "a table of enrollment data from 1650 to 2020, showing a slow upward trend. The largest enrollment was in 1965"

If we don't support this with the current caption and transcription table/api we will have to duplicate a lot of logic in the frontend and backend eg versioning of edits.

In the front end we will want to make the editable, downloadable and render them as text for screen reader and audio.

ghost commented 2 years ago

What should the default value for the Editable enum be? I am guessing that it would be None but I just want to double check. Similarly, should the default value for the TranscriptionType enum be Caption?

Also just so you know, we already have a Description field for the Transcription object.