evancohen / smart-mirror

The fairest of them all. A DIY voice controlled smart mirror with IoT integration.
http://smart-mirror.io
2.8k stars 693 forks source link

Speech recognition not working [Keyword Spotting] #170

Closed mojosoeun closed 8 years ago

mojosoeun commented 8 years ago

Hello. From 9am to 4pm KST, voice recognition didn't work. But after that time it began working again. I'm trying to figure out why but I can't solve it.

evancohen commented 8 years ago

This is a known issue. My current theory is that because of the high volume of mirrors we are collectively using up Electron's Google speech key.

I am currently investigating solutions (and am open to suggestions)

shekit commented 8 years ago

Any updated theory on this? I'm facing a very random issue where my code was working earlier in the day and suddenly stopped working, no voice detection.. nada..

Here's another issue I opened on annyang that details all my attempts https://github.com/TalAter/annyang/issues/188

evancohen commented 8 years ago

You are correct in you assumption that the issue is with key utilization (there have been may discussions about this on the gitter chat, which I suggest you check out).

I'm looking at alternatives (BlueMix, Microsoft, etc) as well as investigating offline Keyword Spotting to reduce quota usage.

evancohen commented 8 years ago

Another update: I've got keyword spotting functioning in the evancohen/keyword-spotting branch. There are a number of issues that exist with this implementation, namely poor performance and some comparability issues with certain microphone setups.

shrimp69 commented 8 years ago

I just set it up today and it seems I already made too many requests. I have my own Google API keys but in a matter of 2-3 minutes I made over 500 requests and now it seems to be down for me.

How do I add this branch to my existing git folder? ( on the raspberry )

skydazz commented 8 years ago

I am still getting "Google speach recognizer is down :(" when I plug in any sort of microphone in to it. I have my Own Speech Keys. May I suggest that its a driver issue. Is it set to only be compatible with a list of mics? (PS: It was "Say "What Can I Say" to see a list of commands" when I unplug the microphone.)

skydazz commented 8 years ago

Also I get the "[1444:0426/103700:ERROR:logging.h(813)] Failed to call method: org.freedesktop.NetworkManager.GetDevices: object_path= /org/freedesktop/NetworkManager: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.NetworkManager was not provided by any .service files" Message in the Terminal When I start it.

skydazz commented 8 years ago

I have started from scratch. I downgraded from Jessie to wheezy and I am following the documentation exactly as it is printed (except config.js). I am using a USB camera as a mic as it states in the documentation also. Will post results soon

skydazz commented 8 years ago

Ok, my issue now, still with sound, is getting the usb camera mic as the mic being used, I can use any USB sound device for anything. I have tried turtle beach px22 controller, USB sound card, and a USB camera

evancohen commented 8 years ago

@skydazz have you tried following the directions in the troubleshooting section of the documentation? You may also want to look at #20 (which was an old thread on the issue that may help you find an alternative solution)

Sachin1968 commented 8 years ago

@skydazz Were you able to resolve the issue you had with " Failed to call method: org.freedesktop.NetworkManager.GetDevices: object_path= /org/freedesktop/NetworkManager: org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.NetworkManager was not provided by any .service files"

I have the same issue and can't figure out how to resolve it. Thanks.

evancohen commented 8 years ago

@Sachin1968 that error is unrelated to this thread and is harmless - you can be safely ignore it. You can find more info on this Chromium forum post.

evancohen commented 8 years ago

So, another update for you all :) Keyword spotting officially works in the in the evancohen/keyword-spotting branch. Unfortunately the Pi is not quite powerful enough to process everything in real time. Because of that I've added a clap detector to that same branch, all you have to do is clap (a configurable number of times) and the mirror will start listening to you.

When using this branch on the Pi there are a few things you need to know You'll have to install sox (it's a dependency for clap detection)

sudo apt-get install sox

You will also have to run npm install after switching to this branch because of the new dependencies. Make sure you update your config.js file to reflect the new properties in config.example.js!

Since this is all very new stuff I haven't had the chance to test it extensively. I already anticipate there being issues with the clap detection microphone configuration, luckily this is totally something that you can set up. In your config you can use the clap override object to change the following settings for clap detection:

overrides : {
    AUDIO_SOURCE: 'hw:1,0', // this is your microphone input. If you don't know it you can refer to this thread (http://www.voxforge.org/home/docs/faq/faq/linux-how-to-determine-your-audio-cards-or-usb-mics-maximum-sampling-rate)
    DETECTION_PERCENTAGE_START : '5%', // minimum noise percentage threshold necessary to start recording sound
    DETECTION_PERCENTAGE_END: '5%',  // minimum noise percentage threshold necessary to stop recording sound
    CLAP_AMPLITUDE_THRESHOLD: 0.7, // minimum amplitude threshold to be considered as clap
    CLAP_ENERGY_THRESHOLD: 0.3,  // maximum energy threshold to be considered as clap
    MAX_HISTORY_LENGTH: 10 // all claps are stored in history, this is its max length
}

As always, if you have any questions you can post them here or ask on gitter.

When Commands aren't working Since commands are intermittent in the dev branch I've added a shim to Annyang to "simulate" a request. This can be done in the dev console with the following:

annyang.simulate("what can I say");
evancohen commented 8 years ago

@Keopss we'll get you sorted out in gitter :)

joerod commented 8 years ago

I had the same problem with overusing my 50 speech API calls in about 10 minutes of use so I'm happy to test the new "clap" feature.

Keopss commented 8 years ago

Hi @evancohen ! i don´t know what happen with my rabs :(

I have installed smart-mirror-master and works fine.

Then i install sox and smart-smirror with keyword then edit config.js and add overrides options but no clap and speech detection.

evancohen commented 8 years ago

The geniuses over at http://kitt.ai have created an offline keyword spotter that should work. In order to find out I need your help to train the keyword "smart mirror". Just follow these steps:

1) Go to https://snowboy.kitt.ai/hotword/47 2) Click "Record and download the model" 3) Follow the instructions to train the model (be sure to click "save" at the end!)

I'll continue to keep this thread updated with my progress and should hopefully have a working prototype this weekend!

Keopss commented 8 years ago

Hi! how gone? fine?

evancohen commented 8 years ago

I have a working implementation of keyword spotting, I'm currently trying to fix an issue on the Pi 3 that causes recognition to fail because of a native PulseAudio issue.

trenkert commented 8 years ago

Hello, I've succesfully installed smart mirror and I am very impressed!

However, I also get "Google Speech Recognizer is down"...

How long will it take for you to implement the new solution into the main branch?

Just an idea: Could you use jasper with pocketsphinx (http://jasperproject.github.io/) to train a number of keywords than then either activate Google STT or Amazon Alexa? Or is Kitt.ai definitely better for keyword recognition?

evancohen commented 8 years ago

I tried Jasper... it's quite resource intensive and is painful to use as a dependency (building projects on top of it is great, building integration into an existing project not so much).

I also tried PulseAudio (same native recognition engine that Jasper uses) and it wasn't quick enough to recognize keywords without significant lag.

Snowboy (from the folks at kitt.ai) is super lightweight and very fast. Sure it requires a wrapper for their Python library, but that's not too difficult.

I actually have a working prototype with snowboy on the kws branch (using the OSX binaries). The only problem on the Pi now is with PulseAudeo, which is having issues with the Pi 3. Once I sort that out (and I think I have a fix) we'll be good to go.

It's been a long journey, with lots of painful dead ends, but I'm feeling really close!

tl;dr Snowboy is great, I should have something working really soon.

trenkert commented 8 years ago

cool! What's the problem with pulseaudio?

evancohen commented 8 years ago

@trenkert it's an issue with Bluetooth that causes PulseAudio to crap out. Even after disabling it within the config the issue persists (which makes me think there may be another root cause). I'm worried that the real cause is a conflict between dependencies of the mirror and keyword spotter (but I haven't confirmed this yet, and I don't think it's the cause).

chenguoguo commented 8 years ago

Sorry for coming into this late. Snowboy is a C++ library and doesn't have much dependency. It will work as long as you can feed it linear PCM data sampled at 16K, with bits per sample 16, and number of channels 1. PyAudio is only used for demo purpose. If it turns out that PyAudio is the problem, we can turn to other alternatives for audio capturing.

In the Snowboy repo we are trying to add examples of using Snowboy in different programing languages. So far the examples are using PortAudio or PyAudio, but if you look at the code (e.g., the C++ demo code https://github.com/Kitt-AI/snowboy/blob/master/examples/C%2B%2B/demo.cc), you can see that switching the audio capturing tool should be easy.

@evancohen, let me know if it turns out that PyAudio is the problem. We can look into other alternatives for audio capturing.

evancohen commented 8 years ago

Hey @chenguoguo thanks for dropping in! Awesome to see you all so committed to your (super awesome) project. I managed to write a pretty hacky IPC between Node and your pre-packaged Snowboy binaries/Python wrapper. It's definitely not the ideal way to use Snowboy with Node, but I just wanted to see if I could get something that would work.

I don't think it would be too challenging to wrap the existing C++ code so it could be easily consumed via Node. I'll take a look at it this weekend if I get the chance 😄

For everyone else: I managed to coax PulseAudio into cooperating on my Pi, and everything seems to work super well! You can test it out by doing the following:

Using the kws branch on the Pi 2/3:
  1. First you'll want to check out the branch and update your config file to include the new config.kws object.
  2. Then you should train your own model, which will be most accurate when done on the Pi itself: https://snowboy.kitt.ai/hotword/47 (bonus points if you also help the Snowboy team train their model)
  3. Download and replace the smart_mirror.pmdl model with the one you just created in the root of the smart-mirror directory.
  4. Install the necessary dependencies: sudo apt-get install python-pyaudio python3-pyaudio sox
  5. Run the mirror! Say "smart mirror" and the mirror should start listening (there is no UI for this yet).

As always let me know if you have any issues over on gitter.

chenguoguo commented 8 years ago

That's great @evancohen! @amir-s is also helping us working on the NodeJS module, see the issue here. He'll likely get something soon.

trenkert commented 8 years ago

@evancohen I've experienced a similar issue. I would guess it has to do with pulseaudio-bluetooth. It works for me when I start pulseaudio manually once again after login.

ojrivera381 commented 8 years ago

@evancohen Thanks. I rebuilt it all seemed to be fine on my lab monitor in my office however when I moved it to its perm location speech stopped working. Also how do i exit it and get to the main desktop with menus. right now if I alt+f4 is closes the window but I can't see any menus to go through pi settings or launch terminal etc.. Thanks again.

evancohen commented 8 years ago

@ojrivera381 is the Pi still connected to the same WiFi network? Have you exceeded your 50 query/day quota? The menu is missing because you have unclutter installed. You can probubly also press the windows key on your keyboard, (which opens the Raspbian equivalent of the start menu). You can also get to the terminal via the recycling bin on the desktop (hacky, I know).

If those two things look good, I would follow the instructions for troubleshooting in the docs.