SEPIA-Framework / sepia-docs

Documentation and Wiki for SEPIA. Please post your questions and bug-reports here in the issues section! Thank you :-)
https://sepia-framework.github.io/
237 stars 16 forks source link

Beginner Question: How to start Speech Recognition? #45

Open jlemmer opened 4 years ago

jlemmer commented 4 years ago

Woh, what a great project! And the first time something could be installed on a Raspberry without lots of errors and problems. ;-) Respect!

But how to start Speech Recognition? I have completed the installation and setup as demanded for a raspberry on a Raspberry 3b (fresh install with latest Raspian light, deactivated wifi and BT), installed the android app, configured my browser (new Edge with Chromium Engine on Windows 10) to accept unsecure connections to the servers, can run the "Hey SEPIA"-Test on the phone and in the browser successfully (on the phone it works just once, then has to be restarted), can chat on both clients via text input, but there is absolutely no reaction on speech input. I can tip the microphone or try "Hey Sepia", but nothing happens on both clients. The ASR engine is set to "Nativ", there is no ARS server adress in the field for a custom speech recognition server, so everything seems to be alright. Here: https://smarthome-training.com/de/snips-alternative-lokale-sprachsteuerung-mit-sepia-und-openhab/ Konstantin obviously has the same problem/question. You write something about secure servers necessary for "full function", but I do not see anything saying that one of the methods described for secure servers is absolutely necessary to make speech recognition work. Can you help me?

sepia-assistant commented 4 years ago

Hey,

glad to hear that the installation worked so smoothly :grin: Now let's find out why ASR ist not working!

First a comment about Windows 10 Edge (Chromium): It's not working in "native" mode unfortunately. I've filed a bug report a while ago in the Microsoft Developer Forum and they said they are working on it :-/ You have 3 options in the browser section:

About Android. What version of Android are you using? Is this maybe a version without Google Play Services installed, e.g. bare Lineage OS or a new Huawei? These have been the only devices so far known to make problems in "native" mode :/ Since "Hey SEPIA" seems to work I assume microphone access working properly? Is speech recognition working in apps like Google Maps?

jlemmer commented 4 years ago

Thanks for quick reply, So the requirements for a browser for the desktop version are a bit "special". But I don't mind. This would be something just for testing (perhaps I will install the new Firefox later to test).

What I am really looking for is a headless client with an array of microphones like the ones respeaker sells and a working installation of the android client on my mobile. It is a Samsung S8 with Android 9 patchlevel of 1st of May 2020, so nothing special without Google Play Services, ....

I followed your advice to check if the speech recognition works in other apps on my mobile. So I acativated the microphone for Google, and it worked there and with the activation in Google it works in Sepia, too. This is nice for the now working speech recognition in Sepia but something I do not like concerning Google. I thought Sepia with the local activation via "Hey Sepia" would allow me to use speech recognition without a constantly eavesdropping Google (no matter that Google gets concrete commands after the local activation of Sepia). Is there a way to let the microphone deactivated for Google or otherwise hinder Google to listen to me while Sepia is able to use it?

fquirin commented 4 years ago

Hi,

sorry for the late reply, I've been a bit busy the last days.

I think I should explain a bit more about the different STT engines you can use in SEPIA. There are 2 basic settings: 'native' and 'custom' (aka SEPIA).

The 'native' engine is using device based services that are implemented via a common speech interface. Depending on your device this can be Apple, Google, Microsoft, Samsung, etc.. Choosing this engine will usually give you the best possible result at the cost of sending data to a cloud service.

The 'custom' engine is using the SEPIA STT server that is completely based on open-source technology. The Docker image (see above) can currently run on any x86 64bit system and I'm currently working on re-enabling it for Raspberry Pi 4. Choosing this engine will make you completely independent of any third-party supplier at the cost of accuracy. It is possible though to increase the accuracy by "training" the system and reducing the vocabulary (which is something I still haven't documented well).

It is a Samsung S8 with Android 9 patchlevel of 1st of May 2020, so nothing special without Google Play Services, [...] with the activation in Google it works in Sepia, too

I've seen that issue before. It seems that deactivating Google will make the 'native' engine unavailable for the whole system. You can probably switch to "Samsung" in the settings, but its still a cloud service and recognition quality is worse. The 'custom' engine will be unaffected by this.

With the current state of open-source technology this is the best compromise I could find. Hope that calrifies some things :-)