WebVTT, Text and SRT support

nodomain commented 1 year ago

Are there plans yet to support other output formats like WebVTT, plain text or SRT?

Still digging through the solution and thinking about adding a converter but I am not sure about the correct approach.

eoinsha commented 1 year ago

Yes, this is definitely something we have discussed and would help improve things like YouTube subtitles. AWS Transcribe supports generation of WebVTT and SRT subtitles already. Of course, we would prefer to use Whisper segments instead of Transcribe for the subtitle text. The challenge here is that the timing granularity of Whisper output is not as fine as with Transcribe, so you end up with longish segments for each timestamp. This may not be desirable for many subtitle uses.

It may be possible to improve the merging algorithm to match the Transcribe timings to the Whisper output, but that seems not so trivial.

Perhaps the simplest thing initially is to just generate VTT or SRT from the Whisper output. This could be done in a separate Lambda function using the merged transcript output (after the Process Transcripts state). It could use the JSON object located at processedTranscriptKey as its input.

If you are interested in contributing this feature, that would be very welcome, @nodomain. We are happy to review and support of course.

nodomain commented 1 year ago

Looking into it already :)

fourTheorem / podwhisperer

WebVTT, Text and SRT support #6