BingLingGroup / autosub

Command-line utility to transcribe/translate from video/audio/subtitles to subtitles
GNU General Public License v2.0
1.97k stars 243 forks source link

Add new provider: wit.ai #169

Open yshalsager opened 3 years ago

yshalsager commented 3 years ago

I would like to suggest adding wit.ai API as a new Speech-to-Text engine. It's a very solid and open-source natural language processing API. https://github.com/wit-ai

I might be able to add it and send a PR if I managed to have some free time after the idea is accepted of course.

Here's an API implementation example I wrote for another project https://github.com/yshalsager/Userge-Plugins/blob/98feca02f75ec2fa18cb49255577af85761d0c37/plugins/transcribe.py#L18

yshalsager commented 3 years ago

@BingLingGroup I have started working on it and finished an initial implementation that works. https://github.com/yshalsager/autosub/commits/witai

However, before I make a pull request I'd like to ask about a point. WIT API accepts audio input as wav, mpeg3, ogg, and raw pcm. For the rate, it should be 8000. I managed to get it to work by defining these options as cli arguments -i test.m4a -S ar-eg -sapi witai -skey xxxxx -asf .pcm -asr 8000 but I believe there should be a way to make this audio configuration autosub's default for WIT speech engine, wouldn't it be better?

BingLingGroup commented 3 years ago

I'm not sure about the accuracy of this api. So I guess it's better not to change the default api especially when it needs to sign up and get the token to use.

yshalsager commented 3 years ago

@BingLingGroup I didn't mean to change the default API. I meant, is there a way provided by autosub code to set default settings of a speech engine?

BingLingGroup commented 3 years ago

Sorry I misunderstood. I set the defaut audio settings here and here. Perhaps it's better to set the constaints in https://github.com/BingLingGroup/autosub/blob/dev/autosub/constants.py.