Closed atdrago closed 2 years ago
I was looking into this a bit last night. Do you have any idea how common this [00:00:00]
format is? Coming in with no knowledge I'm not seeing much about a standard for TXT transcript formats. I do see some mentions of an SRT format and lots of services offering TXT exports too, but no clear indication of the format of TXT formats.
Do we know what the popular transcription services are today? Might be able to code for their export formats.
Problem
text/html can be any valid HTML document, but oftentimes it will have time codes embedded in plain text throughout the HTML.
Example that's entirely text with time codes https://share.transistor.fm/s/0ba4b425/transcript.txt:
Example that has nested HTML with time codes https://feeds.buzzsprout.com/1538779/10139859/transcript:
Example that is all test with no time codes (can't do anything special with these) https://share.transistor.fm/s/c4ee7fb9/transcript.txt:
Solution
We should try to look for these time codes and create a valid text/vtt document from them. If for whatever reason that isn't possible, the transcript should be rendered as it is today.