Closed Rudloff closed 10 years ago
According to the following snippet (taken from the Google Hotword extension for Chrome and adapted for brevity) L16 PCM should be supported, maybe it doesn't accept a .wav container?
var b = new N("https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&app=web-hotword");
Q(b, "client", "chrome-hotword");
Q(b, "key", "AIzaSyCnl6MRydhw_5fLXIdASxkLJzcJh5iX0M4");
var c = {};
c["Content-Type"] = "audio/l16; rate=" + a.Pa;
var d = n(a.tb, a),
e = new Int16Array(4096 * a.t.length);
if (0 <= a.s) {
var f = a.s + 1,
h = 0;
do {
f >= a.t.length && (f = 0);
4096 != a.t[f].length && a.a.log(Za, "ERROR: buffer size " + a.t[f].length, void 0);
for (var m = 0; 4096 > m; ++m) e[h++] = a.t[f][m]
} while (f++ != a.s)
}
$b(b, d, "POST", e, c)
The Int16Array
actually represents an array of twos-complement 16-bit signed integers.
44100Hz is the rate they use, which I've confirmed while debugging the extension.
To be fair, I haven't gotten it to work with WAV L16 PCM, I should take some time trying to capture the packets being sent with Wireshark so I can debug the payload.
The encoding is called l16, not 116. Can you double check that?
Hmmm thanks for the suggestion! Yes now it works but the result is not correct. I was using the same audio with both flac and wav format to do the test, flac one returned me back the correct text while the wav one returned me totally wrong answer.
I only changed the filename and Content-type. Is it expected?
Can you post what the API returned?
Changing the Content-Type header and choosing the correct matching file should suffice.
If I use flac file, this is what I get
{"result":[]} {"result":[{"alternative":[{"transcript":"hello send this message to Google"},{"transcript":"send this message to Google"}],"final":true}],"result_index":0}
If I use wav file, this is what I get:
{"result":[]} {"result":[{"alternative":[{"transcript":"Shin injuries"},{"transcript":"shin injury"},{"transcript":"Sean Lennon"},{"transcript":"Sherman interview"},{"transcript":"Shawn Ashmore Inn"}],"final":true}],"result_index":0}
Another question is, do you know why do I get two results and the first result is always empty?
This is the command I was using
curl -X POST \ --data-binary @audio/test1.wav \ --user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36' \ --header 'Content-Type: audio/l16; rate=44100;' \ 'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=AIzaSyCnl6MRydhw_5fLXIdASxkLJzcJh5iX0M4'
Alright, I've figured out how to get the PCM 16-bit encoding working. Will update the README accordingly and add an example to the audio folder.
Hello,
You say that the API support l16 PCM but I always get empty results when I send a WAV file:
Am I doing something wrong or should we update the doc ?