Closed Smanar closed 8 years ago
Je ne sais pas pourquoi, mais voir Bing devant un produit ne m'inspire pas une très grande confiance... C'est pas comme si le moteur de recherche du même nom ne fonctionnait pas vraiment :laughing:
Lol, de toute facon, je ne l'ai pas encore teste, mais ca va etre dur de faire mieux que celui de google. On commence a crouler sur les moteurs, c'est vraiment le truc a la mode. Moi pour le moment je reste a celui de google, et j'attend celui d'amazon (alexa), qui devrait sortir en FR d'ici un an d'apres leurs dires.
Pour ma part, Wit est tout à fait correct pour le moment, et la configuration de Kaldi sur un serveur local progresse.
Hi @alexylem,
I have just finished a Python script to use Bing Speech API as I cannot get a Google Speech API key (I don't have to use the new one with Cloud Speech API as i need to create a billing account)
The Microsoft documentation is not really good but I succeed in getting back my speech to text with a "nice" result in json and xml format I tried in english, french, chinese, dutch (bad result with that one but i suppose it's still in beta for this language)
Here a sample of my result by reading your github jarvis homepage: pi@raspberrypi:~/jarvis $ ./bingspeech.py
The body data:
Oxford Access Token: {"access_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhcGltLXVzZXItaWQiOiJmZmFmNGQxZWRjMWY0YzgzYjFmZDQwZTkyYWE0YjE2YyIsImFwaW0tc3Vic2NyaXB0aW9uLWlkIjoiOWM4NGEzNGE4ODUwNGJjYmExOTJkYzhiOWVjODdjMTUiLCJhcGltLXVzZXItZW1haWwiOiJ2aW5jaW1vdXNlQG91dGxvb2suY29tIiwiYXBpbS1rZXkiOiI2NGRjYTVmOWY5ZmM0NWE1OGVmYmM5MGU5ZDMwNzJiOCIsImNsaWVudC1pZCI6IjY0ZGNhNWY5ZjlmYzQ1YTU4ZWZiYzkwZTlkMzA3MmI4Iiwic2NvcGUiOiJodHRwczovL3NwZWVjaC5wbGF0Zm9ybS5iaW5nLmNvbSIsImlzcyI6InVybjptcy5veGZvcmQiLCJhdWQiOiJ1cm46bXMuc3BlZWNoIiwiZXhwIjoxNDY5Mzc2MDE5fQ.jsDbr9sfwXDpJynC9zYSp4gok4sf0Bn2cvyPnI4cw7g","token_type":"jwt","expires_in":"600","scope":"https://speech.platform.bing.com"}
200 OK
b'{"version":"3.0","header":{"status":"success","scenario":"ulm","name":"jarvis dot SH it\'s a lightweight if you\'re able utiline jarvis like about windows phone home automation running on slow computers example raspberry pi it install automatically speech recognition in sentences engine of your choices","lexical":"jarvis dot SH it\'s a lightweight if you\'re able utiline jarvis like about windows phone home automation running on slow computers example raspberry pi it install automatically speech recognition in sentences engine of your choices","properties":{"requestid":"818d8000-9c1f-48ab-80c2-99bd676ebeb5","HIGHCONF":"1"}},"results":[{"scenario":"ulm","name":"jarvis dot SH it\'s a lightweight if you\'re able utiline jarvis like about windows phone home automation running on slow computers example raspberry pi it install automatically speech recognition in sentences engine of your choices","lexical":"jarvis dot SH it\'s a lightweight if you\'re able utiline jarvis like about windows phone home automation running on slow computers example raspberry pi it install automatically speech recognition in sentences engine of your choices","confidence":"0.8036776","properties":{"HIGHCONF":"1"}}]}'
@LengZai Wow fantastic, the transcoding is not perfect but I don't know how good your mic / environment / accent are. Is it possible you share it? I may integrate it on Jarvis if you agree
Done! I made it work! Well, the transcoding is not perfect maybe due to my accent :-P
A bit dirty because i didn't want to touch your main code... so i copied the google folder and modify like this to create a substitution... So the "modded" main.sh of google sst_engines looks like this... I was lazy to do it in bash script but i'm sure we can adapt python to bash if we don't want to keep python (Python 3 but i think i found the way to make it work with python2)
#!/bin/bash
_google_transcribe () {
json=`stt_engines/google/bingspeech.py $audiofile`
$verbose && printf "DEBUG: $json\n"
echo $json > $forder
}
google_STT () { # STT () {} Listen & transcribes audio file then writes corresponding text in $forder
LISTEN $audiofile
_google_transcribe &
spinner $!
}
How to share the code with you?
Wow?! I just got your update for the config folder... the update deleted my modifed main.sh for bing. I believed it will be OK to update because on one of your video you mentioned that the setup will ask me to merge or not... Fortunately i put it on this conversation just before getting your update lol and the udpate didn't delete the bingspeech.py file.
Jarvis updates makes sure the original system files are the correct ones, so your modded version of google_stt got "fixed" 😄 your own custom files are preserved (like if you had done a bing
folder until it is in Jarvis repo).
There is no merge anymore (at least not for the config), Jarvis got improved since the videos.
Please attach bingspeech.py
in this ticket. I will try to recode it in bash
to limit dependancies.
Ah ok.. didn't know we can do that on github.. Cool
But... i tried to attach the zip (compress with Windows, Winrar and 7zip)... always get the same error
Edit: Ok i put on this link: http://dl.free.fr/fXYqy7BqO password: jarvis
Since with Bing Speech API we can have 5000 requests per month... If you want to save time... I can provide my subscription key (by PM, email?) except if you have already had one. In all the case, very easy and fast to get one (not like Google Cloud Plateform)
Don't know why... sometime (rare) i got this message on the console:
Traceback (most recent call last):
File "stt_engines/google/bingspeech.py", line 65, in <module>
print(jvalue[0]['name'])
KeyError: _something_
?.
I think it's when i reach more than 20 queries within 1min or when Bing Speech response me with bad result... So you will see on my code that i catch when i don't get the json with "results" For sure we can improve that part to make it better. There is also some options on Bing to accept "bad words" I will try to improve it soon..
Thanks, hope it can help you
I re-scope this ticket to STT only for now
Very well documented code @LengZai, good job! I'll create my own key (because I need to experiment & document it for Jarvis users), test it, and start to re-code it in bash (if possible).
Ok, i think i got why sometime i got the error message on the console.. I remember that sometime i got NOSPEECH value in the json response So we can definitively handle the following error messages return by Bing Speech API to make Jarvis more clever based on the Bing response
JSON-text: version header
Okay, maybe noisy room / mic quality, can happen.
In this case, according to your extract, I'll test status = success
, else ?
I was thinking to do it in bash script too :-P.... let me know your progress.. You will have my support... I'm quite sure it's possible as you just need to save the access_token before sending to Bing Speech API... So compared to Google... we need to send 2 requests instead of one. But i think we can also speed up the process.. (need to try) because the access_token has a 600 expired value (second, millisecond?) or maybe MS detect that the token has been already used so we need to generated it again... As mentioned... need to test that
Yes Indeed @alexylem
Step 1 curl
to get the Token: ✅ Working
https://www.microsoft.com/cognitive-services/en-us/subscriptions
key_1="***************************"
key_2="***************************"
curl -X POST "https://oxford-speech.cloudapp.net/token/issueToken" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "grant_type=client_credentials" \
-d "client_id=$key_1" \
-d "client_secret=$key_2" \
-d "scope=https://speech.platform.bing.com"
gives:
{"access_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJhcGltLXVzZXItaWQiOiIwNjQzZWEyMzRiYzU0YTgzODM1MjMwNWZiYjY2YmRhNSIsImFwaW0tc3Vic2NyaXB0aW9uLWlkIjoiNWViNzIwYjVlZmY1NDljNDkyOTdmZmFmZWMzYzg2YjgiLCJhcGltLXVzZXItZW1haWwiOiJhbGV4YW5kcmUubWVseUBnbWFpbC5jb20iLCJhcGltLWtleSI6IjY5YzhmOWFjOGU2YTQyMTBiNDY3MDc0MWM1ZjVmNTQwIiwiY2xpZW50LWlkIjoiYmM0ZmFlNGZiMTIzNDVmOGFjMTg2MWJiOWVhOWJjZGMiLCJzY29wZSI6Imh0dHBzOi8vc3BlZWNoLnBsYXRmb3JtLmJpbmcuY29tIiwiaXNzIjoidXJuOm1zLm94Zm9yZCIsImF1ZCI6InVybjptcy5zcGVlY2giLCJleHAiOjE0Njk0Njc1MzR9.PWjWOh2Fkr_6MHYFMTpw7tY5a-iqb-EgaiALCvyAeWQ","token_type":"jwt","expires_in":"600","scope":"https://speech.platform.bing.com"}
EDIT: I share because it was not easy to find on the web, hope it will help others.
The expiration time is in seconds.
Yes I plan to use it to avoid useless calls to the token API (deducted from the quota!)
😞 struggling to curl
Bing's recognize api (now that I have the token), I always get:
{"version":"3.0","header":{"status":"error","properties":{"requestid":"0cb1c3fa-c743-4991-9e93-3abc7d38d212"}}}
Maybe an issue with the file format I upload... (16k wav recorded with rec
)
😄 Ok I managed to get it working, It was indeed the encoding of the wav file...
Here is the working code in case someone else is looking for this (I have global variables, replace with appropriate values):
request="https://speech.platform.bing.com/recognize/query"
request+="?version=3.0"
request+="&requestid=`uuidgen`" # generated
request+="&appid=D4D52672-91D7-4C74-8AD8-42B1D98141A5"
request+="&format=json"
request+="&locale=$language" # en-US
request+="&device.os=$platform" # osx
request+="&scenarios=ulm"
request+="&instanceid=E043E4FE-51EF-4B74-8133-B728C4FEA8AA" # jarvis instance id
curl "$request" \
-H "Host: speech.platform.bing.com" \
-H "Content-Type: audio/wav; samplerate=16000" \
-H "Authorization: Bearer $stt_bing_token" \ # token generated see post above
--data-binary "@$audiofile" \ # test.wav
--silent --fail
which gives:
{"version":"3.0","header":{"status":"success","scenario":"ulm","name":"hello","lexical":"hello","properties":{"requestid":"0a64b573-da28-41ba-be00-2381dcdacc6c","HIGHCONF":"1"}},"results":[{"scenario":"ulm","name":"hello","lexical":"hello","confidence":"0.9498509","properties":{"HIGHCONF":"1"}}]}
If I don't face any new pb, Bing should come up in your Jarvis some time tomorrow 😉
Ca y est! Bing est désormais disponible dans la mise à jour de Jarvis, veuillez le sélectionner via: Settings > Voice recognition > Recognition of commands
Comparatif des moteurs de reconnaissance vocale mis à jour: https://github.com/alexylem/jarvis/wiki/stt
Le choix recommandé sera changé d'ici quelques jour après les retours de la communauté des utilisateurs de Jarvis 😄
Nouvelle page expliquant comment se procurer les clés Bing: https://github.com/alexylem/jarvis/wiki/bing
Excellent Alex. Sorry, difficult for me to develop during the weekdays. Please don't forget that there is 2 optional option for Bing Speech API. The best n match and mean words... I will check your github posts tonight
Thanks
I checked the doc and chose to use the name
from the header
. I may disable profanity checker in a further release.
echo $json | perl -lne 'print $1 if m{"name":"([^"]*)"}'
My philosophy is to push new features as soon as they seem working, then I improve them based on community feedback 😄
Realy usefull ^^, thx. Another thing that could be usefull, google support only Flac (44100) and PCM(16000) but we can use other sample rate on Bing, not forced to use the 16000 sample rate.
Because of its simplicity, Jarvis uses the same encoding for all Speech to Text engines. So the same recorded wav could be sent to either google
, pocketsphinx
, wit
and bing
. Fortunately there is a common format they all support 😄
Ha yes, right.
I have made a realy fast try (not a lot free time this week) and for the moment I have better result with bing engine. For example try the string "Deconnectes toi" on both. What is your result ?
Hi @Smanar,
I'm going to try it now..
@alexylem ... I have just updated jarvis with Bing... Thanks :-) FYI: in the Voice recognition menu you put Bing key1 and key2 but you should put Bing key only and as i mentioned on my source code comments, use the same for both so if you renew one of them, you won't impact jarvis. Microsoft provides 2 keys if you have different projects or if you want to lend one key to someone for testing before renew it. Hence the config menu will be simpler by putting only one key
Thanks
@LengZai During my tests I came to the wrong conclusion we had to use both. Let me test with 1 and if indeed it works I'll update it as suggested.
damned!! it works 😄 Updating the code right now... you will have to re-enter your key because I will rename the internal variable from bing_key1
to bing_key
(I like code consistency)
Haha, no prob... It was what i discovered during my test in Python... As all the example on Internet they use the previous Bing Speech API with Project Oxford... As MS changed it to make it "more simple" ... yes indeed... only one key is needed :) Thanks for the (future) update
The update is done already 😄
Bon j'ai rien teste encore, mais avant que j'oublis, moteur de bing/microsoft https://www.microsoft.com/cognitive-services/en-us/speech-api
5,000 transactions per month for free.