dbvideostriketeam / wubloader

MIT License
14 stars 6 forks source link

Subtitle extraction and usage #383

Open ekimekim opened 7 months ago

ekimekim commented 7 months ago

The stream has closed captions embedded in the video channel. We informally call these "subtitles" even though that's technically a different thing (in its own data channel or seperate file).

Currently the only interaction we have with these captions is when doing the torrent transcodes, we go to some effort to extract them into a subtitle channel so they aren't lost.

In addition to the captions which are generated "on the fly" from the broadcaster, we also have Buscribe which does post-processing of the stream and as a result should in theory be more accurate. Buscribe already makes the data available in a postgres database and web UI.

We have multiple use cases for subtitles:

  1. Having a transcription of the stream is useful for searching for when something happened that wasn't written down.
  2. It can also be useful when editing videos if integrated into the editor.
  3. A text-formatted transcription could be suitable for public consumption in the torrent (similarly to chat logs).

This issue covers a few questions:

  1. For each of these three use-cases, does it make more sense to use the broadcaster sourced text or Buscribe? Possibly both?
  2. If we choose to extract the captions, where should this be done and how should it be stored?
  3. For the editor, what is required to get subtitles to display in-video?