alexylem / jarvis

Jarvis.sh is a simple configurable multi-lang assistant.
http://openjarvis.com
MIT License
805 stars 197 forks source link

Using IBM's STT #777

Open Cqoicebordel opened 6 years ago

Cqoicebordel commented 6 years ago

I've tried a lot of STT, because my results are really poor. In one test, I used IBM's speech, and I wanted to share the results :
First, it's at https://www.ibm.com/watson/services/speech-to-text/. It's written nowhere, but you have 100min of transcribe free each 30 days. If the service is not used during 30 days, it's deactivated.
I won't describe how to signup, or to get the auth couple (it's not your account couple, you have to generate some specifically for the service).

Anyway, to test, I modified a few things I'll share here. I won't to a pull request, because I took shortcuts in the code that shouldn't be integrated to Jarvis, but they are few, so someone can correct what I did to integrate the code.
First, I modified /utils/configure.sh to add IBM in it, l64 :

command_stt)           options=('bing' 'wit' 'snowboy' 'pocketsphinx' 'ibm')

Then, I created the folder /stt_engines/ibm/, and wrote main.sh in it :

#!/bin/bash
_ibm_transcribe () {
    json=`curl -s -X POST -u xxxxxx-xxxxx-xxxx-xxxxx:PASSWORD --header "Content-Type: audio/l16; rate=16000" --data-binary @$audiofile "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=fr-FR_BroadbandModel"`
    $verbose && jv_debug "DEBUG: $json"
    echo $json | jq '.["results"][]["alternatives"][]["transcript"]' > $forder
}

ibm_STT () { # STT () {} Listen & transcribes audio file then writes corresponding text in $forder
    LISTEN $audiofile || return $?
    _ibm_transcribe &
   jv_spinner $!
}

As you can see there it's quite easy. Of course you have to change the couple username:password, and probably use a config file for it. I also used the model fr-FR_BroadbandModel hard written in the code. Adapting to the language would be nice. The doc is here : https://console.bluemix.net/docs/services/speech-to-text/input.html#models Finally, I used jq to parse the JSON, because I learned to use it for https://www.openjarvis.com/plugins/pluie-%C3%A0-une-heure. Using perl may be better as it means less dependencies.

For the results, it worked fine. Not better than Bing, and a tad worse than Wit. So, I won't use it. But it might suits better another language or another case of use. That's why I'm sharing it, so it may help maybe someone else.

Oliv4945 commented 6 years ago

Hi,

Thank you for the example. What is missing to do a PR, is it time or skills ?

Cqoicebordel commented 6 years ago

Mostly time and laziness. I don't really want to spend some more time on it, knowing I won't use it. And knowing that anyone with knowledge of the project codes might integrate it in a tenth of the time it would take me. The todo-list isn't long and difficult, at least for someone familiar with the code of Jarvis :

Oliv4945 commented 6 years ago

Ok, I was hopping that it was skills so I can help to to finalize integration :) There is nobody really knowing code as @alexylem is taking a little break, but if somebody want to do it I think that it can be added thanks to:

How to add config variables for login and password, and add the 'UI' to configure them

Change jq to perl jq is already installed and used by Jarvis "core", so it is not a big deal

Handle the language from config Can be obtained from ${language//_/-}, so the command should be

json=`curl -s -X POST -u xxxxxx-xxxxx-xxxx-xxxxx:PASSWORD --header "Content-Type: audio/l16; rate=16000" --data-binary @$audiofile "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=${language//_/-}_BroadbandModel"`

A variable could be added for BroadbandModel and NarrowbandModel

Ok so I did almost all the work, I might integrate it to Jarvis one day :)

Cqoicebordel commented 6 years ago

Except for the fact you wrote twice ibm_watson_user instead of ibm_watson_user and ibm_watson_password, it seems good to me :)

For jq vs. perl, it's because I looked at Google's STT. But you know better than me :)