DevBetterCom / DevBetterWeb

A simple web application for devBetter
https://devbetter.com/
139 stars 57 forks source link

Transcribing archived videos automatically #196

Open justSteve opened 3 years ago

justSteve commented 3 years ago

<< We really need a better way to index the videos. It's on the list, along with getting them hosted in the web app via Vimeo>>

It's an idea I've been mulling for a bit -- there are a number of 'speech-to-text' (post-event processing) services and options that, if we could script and automate the transcription process, Ardalis has a wealth of knowledge wrapped up in DevBetter's archive. Turning that video content to text content would seem like a nature first step to a more robust annotation.

I know that Office's Word app has a speech to text processor that's a) free as its included in most O365 subs; b) good enough to produce text where different speakers are differentiated and noted. Azure has something that (like Word's I've only read about.

Speaking as a new comer to the $$ group I'd really value being able get caught up on various threads of discussion but video is simply too slow. Anything that turns video to text is going to be a huge step in that direction.

I'd suggest we ask for responses here from anyone listening who has any prior experience with speech-to-text (in whatever form) share whatever lessons learned.

Thanks!

ardalis commented 2 years ago

anyone have any thoughts on how to add transcriptions to the videos in an automated fashion? Is that something we can leverage our video hosting provider (Vimeo) to do, or would it need to be a separate custom process of ours?

This seems to suggest we can get Vimeo to do this automatically: https://vimeo.com/blog/post/how-to-transcribe-a-video/

snowfrogdev commented 2 years ago

anyone have any thoughts on how to add transcriptions to the videos in an automated fashion? Is that something we can leverage our video hosting provider (Vimeo) to do, or would it need to be a separate custom process of ours?

This seems to suggest we can get Vimeo to do this automatically: https://vimeo.com/blog/post/how-to-transcribe-a-video/

The link you posted says that if you are on a paid plan, video transcription is done automatically by default. See: https://vimeo.com/features/auto-caption

From what I can tell the transcription/auto-caption is actually working at the moment on devBetter's videos. But I guess we'd also want to actually get the transcript file so we can add the entire transcripted text to the video page on devBetter. I'll look into it.

Are we on a paid plan? Which one?

snowfrogdev commented 2 years ago

If the transcription/caption is automatic on upload, which it seems to be, it looks like we should be able to download it using Vimeo's API. See: https://developer.vimeo.com/api/reference/videos#get_text_tracks

ardalis commented 2 years ago

Yes, we're on a paid plan. @ShadyNagy has the most experience with our Vimeo integration and their APIs if you have questions.

snowfrogdev commented 2 years ago

I noticed that the CCs are only available in videos starting in May 2022. I'm thinking of first implementing this feature so that newly uploaded videos have their transcripts added to the video page, at the bottom.

image

I have a feeling getting transcripts for videos prior to May 2022 might involve a different, more complicated process. So I will leave getting transcripts for previous videos as a separate exercise..

ardalis commented 2 years ago

We may not have been on a paid plan prior to that; not sure. Or maybe it's a newish feature of theirs. Your plan sounds good.

ShadyNagy commented 2 years ago

Transcripts are working on our plan but we only uploaded a few srt files by the uploader.

snowfrogdev commented 2 years ago

I've got the transcript showing up on the Video/Details page now. It is the raw VTT text, so it has an index and time stamp for each entry. @ardalis is that good enough for our purposes or would you prefer I parse the string to remove the index and time stamps, in order to be left only with the text?

image

ardalis commented 2 years ago

I think it'll be hard to use that transcription broken up into 1 second intervals. So, yeah I would want to see it parsed down. The timestamp data is useful - we have the ability to make links into the video at a given timestamp. So, I'd like to see the transcript have links to the video. One way to do that would be to arbitrarily add links (e.g. every 15 or 30 seconds). Another would be to make literally ever few words be a link. I think that would be easier and more useful.

so in you screenshot above there'd just be the text, but every one of those phrases would be in its own anchor tag going to the video at offset 0, 1, 2, 3, etc.

If it gets annoying to read because everything is styled like a link we could simply adjust the styling or have an option to view the transcript without links.

Thoughts?

snowfrogdev commented 2 years ago

Now that #926 has been merged, if I've implemented things correctly, we should automatically see the transcripts displayed on the Video/Details page of videos recorded during and after May 2022 and will work for future videos as long as we remain on a paid plan.

I will wait until we deploy this to make sure it works properly before analyzing how we might get the transcripts to show up for videos recorded prior to May 2022.

snowfrogdev commented 2 years ago

@ardalis Alright, here's the deal. On our current Vimeo plan, we benefit from auto-transcription ON UPLOAD. Retroactive auto-transcription is only available on an Enterprise plan. image

So, the way I see it, here are our options. 1) Switch to an Enterprise plan, even if its just for one month, so we can use Vimeo's own retroactive auto-generation of transcript.

2) Download, Delete and reUpload every old video so it triggers the auto-transcription. This could be done all in one go, with a script, but you'd probably bust your monthly plan quotas. Or you could do it manually, a few at a time, over the next several months until the whole back catalog is done.

3) Use a third party (paid, couldn't find a free one) service to transcript the back catalog. Would involve finding one that offers an API. Then write a script to download video from Vimeo, upload to the service, get the transcript file and upload it to Vimeo. Could also be done manually.

Thoughts?

snowfrogdev commented 2 years ago

@ardalis have you given this some thoughts on how you'd like to proceed with this? One additional option that I didn't list is to simply forget about getting the transcripts for older videos and just be happy with the ones we have, knowing that from now on, transcripts will be automatically generated and displayed on new videos.

ardalis commented 2 years ago

Either 1 or 0 (let it stay as is) is probably the way to go. I'll see what the enterprise plan involves, $ wise. If we switch to that plan do we need to do anything to get the transcripts? If so, what?