RESTFest / videos.restfest.org

Curation space for the REST Fest Videos Project content
https://restfest.github.io/videos.restfest.org/
Apache License 2.0
2 stars 4 forks source link

Transcribe Vimeo videos #6

Open alexdresko opened 5 years ago

alexdresko commented 5 years ago

Plan is to loop through all of the video data at https://github.com/RESTFest/videos.restfest.org/tree/master/_data/videos and pass them to a transcription API. We'll turn the result into JSON and drop it somewhere here in teh repo.

alexdresko commented 5 years ago

@bradgrace

alexdresko commented 5 years ago

Here's my initial research:

Important questions

How many hours of video do we have?

General requirements

From what I've seen while investigating various speech to text APIs, it's always either cheaper or a requirement, that audio is submitted to the API as opposed to video. Based on that, it looks like our solution will need to download the videos locally for processes. If the chosen API requires audio only, we'll have to extract the audio from the videos, then push the audio to the API.

Currently, downloads are not enabled for videos.restfest.org. Here's what I found about that:

https://help.vimeo.com/hc/en-us/articles/229678128-Downloading-videos

Plus, PRO, Business, and Premium members have the option to enable their videos for download. If you have a Basic membership and upgrade your account, the option for enabling downloads will be automatically turned on.

The availability of videos for download depends on the subscription tier of the video’s creator. Basic members cannot enable their videos to be downloaded; however, if the video belongs to a Plus, Pro, or Business member, they have the option to toggle on the download option.

Plus, PRO, and Business members have the ability to store their original, untranscoded source files right here on Vimeo. This means that, if you are a Plus, PRO, or Business member that chooses to store your source files, you will always be able download your original file, as long as you maintain your paid subscription with us. You can choose to make your original file downloadable by others, too.

While Basic members can indeed download the source files made available by Plus, PRO, and Business members, they do not have the ability to store or share their own source files on Vimeo.

Google Cloud Speech to Text

is an option, but it wouldn't be free.

https://cloud.google.com/speech-to-text/

image

Cloudinary

Is this the service Benjamin was talking about? If so, hook me up with some contact deets. If not, what was the service you were talking about and hook me up with some contact deets. :)

Azure Cognitive Services

https://azure.microsoft.com/en-us/pricing/details/cognitive-services/speech-services/

This looks to be the winner for now. The free tier allows 5 hours of transcription per month. It might take a couple months, but we can work with that.

image

Kaldi?

May be a thing. I mean, it's a thing. Just not sure what kind of thing it is.

http://kaldi-asr.org/doc/index.html

IBM's thing

IBM has a thing. Hard cap at 100 minutes of translation.

https://www.ibm.com/watson/services/speech-to-text/?S_PKG=AW&cm_mmc=Search_Google-_-Watson+Core_Watson+Core+-+Discovery-_-WW_NA-_-+software++speech++to++text_Broad_&cm_mmca1=000000OF&cm_mmca2=10000409&cm_mmca7=9010601&cm_mmca8=kwd-320402843704&cm_mmca9=40e9640b-9708-4a80-a52b-5e5e01593f90&cm_mmca10=260794964851&cm_mmca11=b&mkwid=40e9640b-9708-4a80-a52b-5e5e01593f90|1081|13959&cvosrc=ppc.google.%2Bsoftware%20%2Bspeech%20%2Bto%20%2Btext&cvo_campaign=000000OF&cvo_crid=260794964851&Matchtype=b&gclid=EAIaIQobChMIhOr4k63b3QIVlMDICh2ExAdMEAAYAiAAEgIkN_D_BwE

The AWS thing

Ultimately, it's got a hard cap at 12 hours

https://aws.amazon.com/transcribe/pricing/

image

The non-profit advantage

We could speed this up if restfest was a non-profit. There's always a non-profit tier with these services that gives you additional monthly credits. Should we create an issue to track the non-profit effort?

alexdresko commented 5 years ago

@bradgrace calculated 75.8 hours of video by summing all the duration properties in the Vimeo JSON data. He's made a Powershell script that collects and parses all of the data. From that, we've got a good starting point for scripting the translation.

Given Azure's 5/hours per month limit, it would take 16 months to translate all of the videos... UNLESS we split the job across multiple computers. :)

alexdresko commented 5 years ago

Getting started with C# - the next sample is more important, but this sample at least shows how to get the nuget package.

Continuous speech recognition from a file

Our goal is to attempt to translate those C# samples to Powershell. Powershell runs on most OSs now. If Powershell fails us, we'll switch to .NET Core.

alexdresko commented 5 years ago

We'll use https://www.ffmpeg.org/ to extract the wav from the mp4

https://superuser.com/questions/609740/extracting-wav-from-mp4-while-preserving-the-highest-possible-quality

I confirmed I can use ffmpeg to extract the wav from one of the restfest mp4 files that I downloaded manually.