TalAter / annyang

:speech_balloon: Speech recognition for your site
https://www.talater.com/annyang/
MIT License
6.57k stars 1.05k forks source link

Hotword Detection #100

Open alanjames1987 opened 9 years ago

alanjames1987 commented 9 years ago

I would like to have hotword detection availible in Annyang, basically allowing me to tell Annyang to not "activate" until a certain hotword is spoken. Essentally speech recognition would be working the entire time but only start caching results returned from the webkitSpeechRecognition from the first instance of the spoken hotword.

Similar to how Okay Google works, but without the plugin.

TalAter commented 9 years ago

Closing as a duplicate of #18

alanjames1987 commented 9 years ago

I'm not meaning a command prefix, I'm meaning a string that when detected will trigger a callback function so that the user can be visually informed speech input has been activated.

TalAter commented 8 years ago

Hello again @alanjames1987, sorry for the late response on this issue.

Here's an idea of how to do this, using the new regular expression support available in v2.0.0

// run this after hotword was detected to register the "real" commands
var hotWordDetected = function() {
  annyang.removeCommands();
  annyang.addCommands({
    'hello': function() { alert('Hello world!'); }
  });
}

// initial command to listen for the hotword
var hotwordCommand = {
  'hotword': {'regexp': /shenanigans/, 'callback': hotWordDetected}
}

annyang.addCommands(hotwordCommand);
annyang.start();

If what you had in mind was that the user would have to always say the hotword before each command, you can do something like:

var hello = function() {
  alert('Hello world!');
}

var goodbye = function() {
  alert('Goodbye world!');
}

annyang.addCommands({
  'hello':    {'regexp': /.* shenanigans hello/, 'callback': hello},
  'goodbye':  {'regexp': /.* shenanigans goodbye/, 'callback': goodbye}
});
annyang.start();
alanjames1987 commented 8 years ago

I will try that out but that might be a good solution. It still seems slightly like a hack.

I was hoping this could be built into annyang. The code to interact with it might look something like this.

function hotwordDetectionHandler() {

    // code to trigger a sound
    // or update interface to show it's listening

}

function hotwordTimeoutHandler() {

    // code to trigger a sound
    // or update interface to show it's not listening

}

var hotwords = {
    '(hey) computer': hotwordDetectionHandler,
    '(hey) hal': hotwordDetectionHandler,
    '(hey) jarvis': hotwordDetectionHandler,
};

var commands = {
    'show me *term': showFlickr,
    'calculate :month stats': calculateStats,
    'say hello (to my little) friend': greeting
};

annyang.hotwords(true);

annyang.addHotwords(hotwords);

annyang.hotwordTimeout(1000); // <-- if a sentence isn't started within the time a deactivation function is called
annyang.hotwordTimeoutHandler(hotwordTimeoutHandler); // <-- function to run after timeout

annyang.addCommands(hotwordCommand);
annyang.start();
TalAter commented 8 years ago

That's an interesting idea... and a very well thought out API! I wish all issues were posted like this :+1: Thanks

How do you see the importance of allowing separate hotwordDetectionHandlers? Why allow just one hotwordTimeoutHandler but multiple hotwordDetectionHandlers?

Is there a specific common use case that requires multiple ones? Would it be good enough to just return the captured hotword as a parameter to the hotwordDetectionHandler?

This would allow us to simplify the API to something like:

var hotwords = [
    '(hey) computer',
    /(hey|hello) hal/
];
alanjames1987 commented 8 years ago

I don't think multiple hotwordDetectionHandlers is very important. I added it in there because it was in line with the commands are currently added to annyang and I was trying to keep a similar API.

I think the idea of sending the spoken hotword to the hotword handler is great.

TalAter commented 8 years ago

Sounds good.

Would you like to give this a shot and send me a pull request?

alanjames1987 commented 8 years ago

I will look into this as soon as I can, hopefully this weekend. I know I will have to use interim results, so I will be enabling that.

TalAter commented 8 years ago

Enabling interim results seems like a very drastic change to how annyang works, and doesn't really seem required for hotword detection.

Is there a reason this feature can't be enabled without enabling interim results?

alanjames1987 commented 8 years ago

I can only see real time hotword detection being added if we have real time results using interim results.

There might be a better way. If you think there is I would love to hear it.

4tee commented 8 years ago

Hi there, I am wondering if this feature has been added.

alanjames1987 commented 8 years ago

It hasn't been added. I have had no time to work on this yet.

unbolt commented 8 years ago

Any plans on a timeframe for getting this feature implemented?

revett commented 8 years ago

+1

MariuszT commented 7 years ago

+1

xuchen commented 7 years ago

Looks like the Snowboy hotword detection toolkit is exactly used for this purpose:

https://github.com/kitt-ai/snowboy

It works offline so no streaming data to Google until you explicitly activate it.

Currently there are discussions about a NodeJS module (https://github.com/Kitt-AI/snowboy/issues/4). Anyone wants to give it a try?

evancohen commented 7 years ago

Now that we've finished the snowboy node module, I can continue with my master plan!

Because annyang is such an awesome library, there have been loads of people (myself included), that have used it for "non-web" (Electron or otherwise) projects. Just to make my point, there are over 700 forks of @TalAter's annyang-electron-demo.

That's is why I've started building sonus: a node speech framework that uses snowboy for hotword detection and Google Cloud Speech for accurate recognition. I haven't quite started working on the annyang shim yet, still a few things to iron out, but I'm planning to use it as one of the command recognition systems.

It's probably worth pointing out that it's not ready for prime time just yet, but I am looking for collaborators, so if you're interested hit me up!

🚀

lynxaegon commented 7 years ago

I'm currently building a "Jarvis" like system based on a chromium-browser and a rpi with a 7" screen. At first, when i saw annyang doesn't use hotwords, it was perfect. But after adding a few commands, well.. you can imagine what chaos is in the house :)

I'm looking forward for a hotword plugin / update for annyang.

evancohen commented 7 years ago

After some deliberation I decided to take the "core" of annyang and include it in the project - it wasn't built to run outside of the web browser and there's a lot of logic that Sonus already offers that would take a lot of work to plumb into annyang.

I've included the annyang command registration system out of the box as a part of Sonus. Here's an example:

'use strict'

const Sonus = require('sonus')
const speech = require('@google-cloud/speech')({
  projectId: 'streaming-speech-sample',
  keyFilename: './keyfile.json'
})

const hotwords = [{ file: './resources/sonus.pmdl', hotword: 'sonus' }]
const language = "en-US"
const sonus = Sonus.init({ hotwords, language }, speech)

const commands = {
  'hello': () => {
    console.log('You will obey');
  },
  '(give me) :flavor ice cream': flavor => {
    console.log('Fetching some ' + flavor + ' ice cream for you, yo')
  },
  'turn (the)(lights) :state (the)(lights)': state => {
    console.log('Turning the lights', (state == 'on') ? state : 'off')
  },
  'stop': () => {
    console.log('Stopping...')
  }
}

Sonus.annyang.addCommands(commands)

Sonus.start(sonus)
console.log('Say "' + hotwords[0].hotword + '"...')

sonus.on('hotword', (index, keyword) => console.log("!" + keyword))
sonus.on('partial-result', result => console.log("Partial", result))

sonus.on('final-result', result => {
  console.log("Final", result)
  if (result.includes("stop")) {
    Sonus.stop()
  }
})

As of tonight I've published v0.1.0 which includes annyang and can be installed by following the instructions in the repo: https://github.com/evancohen/sonus

Feedback is welcome and appreciated.

BetaStacks commented 7 years ago

Here is a how I ended up creating a Global Command Prefix and Suffix http://codepen.io/BrandonCorlett/pen/mRdMqY

/* SET GLOBAL COMMAND PREFIX */
var globalCommandPrefix = "Computer (please)" + " ";

/* SET GLOBAL COMMAND Suffix */
var globalCommandSuffix = " " + "(please)";

/* SET UNIQUE COMMAND TEXT */
var command1 = "say my name is :name";
var command2 = "I am :name";

(function () {
    var commands, log, sayName;
    log = $('.log');
    sayName = function (name) {
        log.append('<li>Your name is ' + name + '!</li>');
        return console.log(name);
    };

 /* CONCATENATE COMMANDs IN VARIABLES */ 

  var command1Con = globalCommandPrefix + command1 + globalCommandSuffix;
   var command2Con = globalCommandPrefix + command2 + globalCommandSuffix;

  /* USE VARIABLE IN BRACKETS AS OBJECT KEY */
    commands = { [command1Con]: sayName,
               [command2Con]: sayName};
    annyang.addCommands(commands);
    annyang.start();
    annyang.debug();
}.call(this));

$('.globalCommandPrefix').text(globalCommandPrefix);
$('.globalCommandSuffix').text(globalCommandSuffix);
$('.command1').text(command1);
$('.command2').text(command2);

I'm sure it could be a bit cleaner. It works well for my use case as I am developing a plugin for another piece of software who's API allows be to use a GUI to toggle on and off parts of the code each command/function time I drag a new stack into the IDE.

I set the global commands once per page or use PHP to set it once per site.

Nixellion commented 7 years ago

I'll +1 to this issue. It would definitely be awesome to have some front-end javascript based hotword detection. If i'm correct snowboy and sonus both require node.js server side stuff?

I'm writing my own home assistant bot as well, using Python for command processing, and I only use browser as a UI that recognizes speech and sends text commands to the Python Flask server.

I chose this approach, because this way I can just put a few cheap android or windows tablets around the house, instead of dealing with and mixing a lot of microphones routed to one pc. It also allows me to use my AI when I'm not at home. So it makes it more like Cortana\OkGoogle\Alexa.

So I'm really curious about how to detect hotwords with browser-side JS. Not feeling like writing an app for this yet :)

evancohen commented 7 years ago

@Nixellion

Sonus uses Node.js, but it's a bit a-typical because it's primarily a "client" library intended for low powered hardware devices. I'm also looking to create a Python interface: evancohen/sonus#13.

To address your main question: You can run browser based detection with pocketsphinx. An alternative that I really like is JsSpeechRecognizer. You need a reasonably high powered device in order to actually get real-time recognition for both of these. Accuracy is also a big problem, if you have any background noise you are unlikely to get any kind of reasonable detection (and lots of false positives).

I went down the "offline hotword recognition in the browser" path for my smart mirror. After a lot of pain and dead-ends I found snowboy, wrote their Node library, and created sonus.

As an aside (and for inspiration): My current home automation solution right now uses a bunch of $9 CHIPs + $5 PlayStation Eyes + Sonus. Each device is location aware ("turn on the lights" will do something different depending on what room you are in, but "turn on the living room lights" will always turn on the living room lights). Also cool: Next Thing Co also recently released the $16 CHIP Pro which has an on-board microphone (I've yet to receive mine, but it looks promising).

Nixellion commented 7 years ago

I don't really need offline recognition, I only need offline hotword detection in browser, to activate google's online speech recognition after that. This way I'll be able to both NOT spam google with non-stop speech recognition requests (well, as far as people are talking), and talking paranoia - it will only get commands for recognition, no private talks.

And running offline speech recognition is even harder, because I need it to work with Russian language, and sphinx only supports english out of the box.

As for the power, I can record audio in the browser, send it to the home server (powerful DIY NAS), it can recognize whether there is hotword or not, but that would probably take too long.

ghost commented 7 years ago

@Nixellion For hotword detection in browser i went with annyang, but i made a "Conversation" class. You can have a list of commands, which when triggered, gives you another list of commands and so on. You could you that for hotword detection, you just add the first command to be the "hotword" and in the conversation class you add the rest of the commands. I'll just leave it here for anyone that wants to do something with it. (the code in a few hours, about 3-4, it's at home :D)

@evancohen It was a surprise for me to see the chip is so cheap and the pro version has a microphone. You can add a lot of chips in the house, for recognition, and just a main rpi (like mine) as the brains. The only problem.. i used the chrome speech recognition (browser based) which doesn't quite work with sounds from sources like mp3/wav. How do you do speech recognition ?

Nixellion commented 7 years ago

@andreimavenhut Oh, I guess I did not express myself correctly. I DO need offline hotword detection, so annyang is not an option. Annyang is using google's recognition, so in a noisy room it will send audio to google basically non-stop. It's bad for a huge number of reasons, starting with network bandwidth and ending with privacy.

Right now I use Annyang in a form of just ONE command basicallly. , *tag. It just grabs everything after botname, and sends it to my personal Python server, which then does all the natural language processing, user-specific context, user-specific conversations, finding the right command and\or using chatbot. I limit the use of JS only for a very simple web-ui. This way I can then make very simple native apps for other platforms IF needed. And I won't have to rewrite a lot of code for that. And it's more secure, I can give client access to any number of friends, and they can have fun with my bot, and have security clearance restricting them from accessing sensitive commands :D I actually already have the groundwork for speaker recognition. My client can send audio to server for processing, but I got stuck on actual audio speaker recognition yet.

So, I don't think it's a good solution to detect hotword with annyang, then process another command. With annyang it's easier to just use commands with global prefix. Because I don't really see any other reasons other than bandwidth and privacy that you would need a separate hotword, it only makes running all commands in 2 steps instead of one. Instead of just saying without a pause "SuperBot, kill the lights!" you will have to go through a dialog:

With prefix approach it's just:

Now, I could use Python's speech recognition for hotword detection, sending audio to the server to proocess it using some custom matching algorythm, but I don't want to put so much data through my local network all the time. I mean, always sending audio, each time there is SOME sound detected...

Oh, and about your second question. While you're waiting for evancohen's answer, my opinion is that with Rpi or chips you should probably go with Python, using it's SpeechRecognition module, which support online recognition using Google's services, and also bing and a number of other online services (which you have to get API though). It does not support russian Yandex recognition service yet, but in fact it's not that hard to write your own online recognition module. It's all about recording audio, and just sending it as POST request to their server, and receiving the JSON response.

But SpeechRecognition (or SpeechRecognizer? Not sure how it's called in pip) also supports offline recognition using Sphinx. If you're english speaker, you're in huge luck. It does a nice job at recognizing english language out of the box. Worse than google's or any other online service (they're constantly improving, from what I understand they use neural networks to improve recognition over time), but it's still pretty good.

ghost commented 7 years ago

I thought about using sphinx or another offline recognizer, but after a few benchmarks i went with the SpeechRecognition in chrome. I know you can use their APIs, but hey.. they do cost :) and inside chrome, the speechRecognition has a ApiKey that (from what i know) it's unlimited, which converts for me in 0 costs. I wonder if @evancohen found a better way of detecting speech online or offline without any costs.

Nixellion commented 7 years ago

@andreimavenhut , Well, Chrome's speech recognition is actually using Google's servers as well, from what I know, so it's still bandwidth usage and all.

And in Python's speech recognition there is actually an unlimited Google apikey as well. So you get it for free in python too. And sphinx is of course free as well but a pain in the ass :D

evancohen commented 7 years ago

@andreimavenhut For speech recognition I use Sonus. In terms of audio encoding, it uses 16-bit signed-integer linear pulse modulation coded WAV (no mp3 support today). It's entirely stream based, so you could theoretically stream to it from your web browser so a server instance of Sonus (although I've never actually tried this).

One big problem trying to do keyword spotting (aka hotword detection) off-device is latency/lag in detection. That's also a problem in the browser, JS simply isn't really optimized for audio processing... That's not to say it can't be done - it will probably just be a bit slower.

I saw @Nixellion's comment on Kitt-AI/snowboy#98 and would love to see browser compatibility (I would create a browser based version of Sonus in a heartbeat). Based on what you described it's exactly what you are looking for.

Since this conversation is no longer directly related to Annyang (and so we don't spam others) I've created a new issue on the Sonus repo to continue this discussion: evancohen/sonus#28

gaitat commented 6 years ago

Is there an update on this issue? i.e. using annyang along with a hotword? Is the solution to always join the hotword in front of the command?

Nixellion commented 6 years ago

@gaitat You could try running continious recognition and checking if there's a hotword on each update. Once there is - restart and go for the phrase recognition. I did not test this approach, but thinking about doing it some time.

If you're not worried about constant stream of your audio going to google's servers that is.

Alternatively, instead of appending, I would also split the string at the hotword. Because you may be talking something, recognition starts. And in the middle of your talk you say your hotword and command. It will be in the middle, not in front of the string. Did not try this approach either though :D

LukeMcLachlan commented 6 years ago

@andreimavenhut HI Andrei I was reading with interest about your conversation class. Currently I'm adding items via speech to physical boxes with Annyang, saying for example "add gloves to box number 4" (gloves are then saved to box 4 in MySQL), but I was thinking about the possibility of removing things and Annyang asking me e.g. "are you sure you want to remove the candle from box number 4", then waiting for me to say either "yes" or "no". The way I was thinking of going was cookies, for example when I say "remove candle from box number 4" a cookie is stored in the browser that lasts e.g 10 seconds, so when Annyang asks "are you sure you want to remove the candle from box number 4" and I answer "yes", the "yes" triggers a function that searches for the cookie and if found removes the candle from box number 4. It sounds as though you may have a better way of having a conversation with Annyang, if so would you care to share? Thanks / Luke

lynxaegon commented 6 years ago

@LukeMcLachlan Hi Luke, it's not that great of a module/class, i just hacked it fast. I wanted to refactor the whole app, but i haven't had the time to do it, but it works :)

Here it is: https://gist.github.com/lynxaegon/a76ae8d2ac30f80dff93027290c9e577

Short Explanation: Whenever it matches a command, it just resets the annyang commands, and adds the current conversation commands. If you don't answer in 10s, it reverts back to the main commands.

LukeMcLachlan commented 6 years ago

Thank you @lynxaegon I'll have a look at it this evening and see what I can do with it, very kind of you to share it with me!