Open mph070770 opened 8 years ago
The [ding] sound is actually a callback function you can define yourself. Here's an idea:
is_triggered = False
is_triggered = True
in your callbackDoes it make sense?
What Xuchen said was correct. You may have to play with the audio buffer a little bit, to make sure you send all the audio after hotword detection to the ASR.
Guoguo
On Sat, May 14, 2016 at 1:01 AM, xuchen notifications@github.com wrote:
The [ding] sound is actually a callback function you can define yourself. Here's an idea:
- keep an audio buffer and a global variable is_triggered = False
- when triggered, set is_triggered = True in your callback
- send any audio after this point in your buffer to AVS for speech recognition.
Does it make sense?
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/Kitt-AI/snowboy/issues/1#issuecomment-219101033
Looks like it has been resolved, so closing this.
Thanks for the feedback. Are you suggesting a new audio buffer or utilising the ring buffer?
Re-opening this since there's on-going discussion... Let me write in more details.
Does this solve your problem?
Closing this as it has been integrated into AlexaPI. See:
I don't think is closed. This issue is related to continuous detection using a buffer. Alexa-Pi only uses the hotword
OK re-open it. What I suggested above should still stand.
I did this by customizing the snowboy_index.js. In the processDetectionResult function I set a "command" flag once the hotword is detected and emit all chunks until silence is detected. Another script builds a buffer from all the chunks and sends them to Microsoft LUIS for recognition.
So you can say "Alexa turn off the lights" all in one phrase without pausing.
_write(chunk, encoding, callback) {
var parent = this;
const index = this.nativeInstance.RunDetection(chunk);
this.processDetectionResult(index, chunk);
if(parent.bufferingCommand == true)
{
this.emit('chunk', chunk, encoding);
}
return callback();
}
@dmc6297 you might want to check out Sonus. There's an implementation on the audio-buffer branch which uses a ringbuffer + stream transformation (basically what @chenguoguo described in this thread).
The only drawback with my ring buffer implementation is that it doesn't perform super well on low powered devices (Like the Pi Zero, where detection lag increases by about 1/3 of a second).
Hi,
I'm looking for something like this too using nodejs but less sophisticated :-)
@evancohen, I've seen your project it seems it could probably satisfy my needs (except for MS Cognitive Services).
There are several steps that I can manage using 2 "audio buffers" (one for snowboy, one for Bing). But I think I'm not on the good path.
This is the workflow I'd like to implement. I have several hotwords
a) if it's "Time, Light,..." then I run my "local action" b) if "Go Online" is detected then I say to the user I'm listening
c.1) if the word/sentence doesn't not exist within Snowboy Model and I'm in "listening mode" I would like to send the word/sentence online (using MS Cognitive Services).
c.2) if the word/sentence exists within the Model and I'm in "listening mode", I don't want to send the data online.
d) if it's "Bye", any word/sentence will be sent online until the user says "Go Online" e) When a silence of x seconds is detected, I need to back "offline" (means any word/sentence will be sent online until the user says the "Go Online"
@dmc6297 I tried your customized snowboy_index.js but it doesn't work for me. When I save the chunk into a buffer, the final file (I concatenate the buffer into a array of bytes), the wav file is inaudible.
detector.on('chunk', function (chunk, encoding) {
if (chunk){
buffers.push(chunk);
if ((new Date()-timeStart)/1000 > timerInSecond ) {
detector.bufferingCommand=false;
getText(buffers);
}
}
});
The getText transforms the buffer into an array of bytes and sends it to an api var bytes = Buffer.concat(buffers);
Could you please give me a hand? Thanks
@Stan92 The data is pcm audio, you will need to prepend a wav header to the buffer, or convert to another format. This is how I made it work.
Start the command buffer
detector.on('commandStart', function (hotwordChunk) { audioCommandBuffer = new Buffer(5000);
var samplesLength = 10000;
var header = new Buffer(1024);
header.write('RIFF',0);
//file length
header.writeUInt32LE(32 + samplesLength * 2,4);
header.write('WAVE',8);
//format chunk idnetifier
header.write('fmt ',12);
//format chunk length
header.writeUInt32LE(16,16);
//sample format (raw)
header.writeUInt16LE(1,20);
//Channel Count
header.writeUInt16LE(detector.numChannels(),22);
//sample rate
header.writeUInt32LE(detector.sampleRate(),24);
//byte rate
//header.writeUInt32LE(detector.sampleRate() * 4,28);
header.writeUInt32LE(32000,28);
//block align (channel count * bytes per sample)
header.writeUInt16LE(2,32);
//bits per sample
header.writeUInt16LE(16,34);
//data chunk identifier
header.write('data',36);
//data chunk length
header.writeUInt32LE(15728640,40);
audioCommandBuffer = header.slice(0,50);
//Comment this out to omit the hotword chunk of audio
audioCommandBuffer = Buffer.concat([audioCommandBuffer,hotwordChunk]);
});
Append to the buffer
detector.on('chunk', function (chunk, encoding) { audioCommandBuffer = Buffer.concat([audioCommandBuffer,chunk]); });
And to output the buffer to a file
detector.on('commandStop', function () { fs.writeFile('/home/pi/Speech/audio.wav',audioCommandBuffer); });
@dmc6297 ... I don't know how to thank you... :-).. I'll make a try asap Thanks once again
Hey you guys, I think this thread is exactly what I am trying to do but in Python. On top of being able to say the full sentence without stopping, I'd also like the capability to keep a 3seconds buffer before HWD kicks-in so I can say stuff like "Goodnight Snowboy". or "What do you think Snowboy" through Google Speech API. Any suggestions on how to achieve that?
As you said you can maintain a buffer before the hotword, and when the hotword is detected, you send the buffer to Google Speech API, and see if there's anything meaningful there.
Someone can write an example for nodejs?
https://github.com/evancohen/sonus/tree/audio-buffer ^ This branch has an example that uses a ring buffer
On Sun, Oct 22, 2017 at 1:43 PM sintetico82 notifications@github.com wrote:
Someone can write an example for nodejs?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Kitt-AI/snowboy/issues/1#issuecomment-338507629, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJJHWbG6Ak6tQgz072bXlxlVwIRIyFiks5su6j7gaJpZM4Id35U .
--
//mobile
@zikphil Were you able to get this working? I am trying to do the same thing. Any help is appreciated. thanks.
Hi - great software!
I have your demo working with Ubuntu. What I'd like to do is detect the keyword in continuous speech in a similar way to the Amazon echo. Is that possible? For example, this:
"Alexa, turn on the lights"
instead of
"Alexa" [ding] "turn on the lights"
Ideally, I'd also want to know where in the audio the keyword was spoken so that it can be removed from audio before I send it to an online engine (such as api.ai or AVS).
Any suggestions would be great.
Thanks