Issue 26: KWS - Githubissues

Gl0dny commented 1 month ago

[ ] Train or download a KWS model for your hexapod's onboard computer.
[ ] Respond to keywords using pre-programmed responses or integrate with an AI like ChatGPT for dynamic conversation.
[ ] speech recognition and [Natural-language understanding (NLU)
[ ] Keyword detection

Gl0dny commented 1 month ago

KWS : Custom one: Porcupine https://picovoice.ai/docs/ Continous speech recognition: Vosk https://github.com/alphacep/vosk-api

Gl0dny commented 1 month ago

ReSpeaker 4-Mic Array for Raspberry Pi | Seeed Studio Wiki

picovoice/demo/respeaker at master · Picovoice/picovoice · GitHub

GitHub - Picovoice/picovoice: On-device voice assistant platform powered by deep learning

Picovoice enables enterprises to innovate and differentiate rapidly with private voice AI. Build a unified AI strategy around your brand and products with our speech recognition and Natural-language understanding (NLU) technologies.

Seeed has partnered with Picovice to bring Speech Recognition solution on the edge using ReSpeaker 4 Mic for developers.

Picovoice is an end-to-end platform for building voice products on your terms. It enables creating voice experiences similar to Alexa and Google. But it entirely runs 100% on-device. There are advantages of Picovoice:

Private: Everything is processed offline. Intrinsically HIPAA and GDPR compliant.
Reliable: Runs without needing constant connectivity.
Zero Latency: Edge-first architecture eliminates unpredictable network delay.
Accurate: Resilient to noise and reverberation. It outperforms cloud-based alternatives by wide margins.
Cross-Platform: Design once, deploy anywhere. Build using familiar languages and frameworks.

Picovoice

Functionality: Picovoice is a comprehensive voice AI platform that offers speech recognition, keyword spotting, and natural language understanding capabilities. It allows developers to build custom voice interfaces for various applications.
Components: It includes tools like Picovoice Console for managing voice models, Picovoice SDKs for various platforms, and built-in integration with popular cloud services.
Customization: Picovoice provides options to create custom wake words, enabling developers to tailor the voice recognition experience to specific applications.
Multi-Language Support: It supports multiple languages for both wake word detection and speech recognition.
Use Cases: Suitable for applications that require both keyword spotting and full speech recognition, such as smart home devices, voice assistants, and interactive voice applications.

The lighting commands are defined by a Picovoice Speech-to-Intent context. You can design and train contexts by typing in the allowed grammar using Picovoice Console. You can test your changes in-browser as you edit with the microphone button. Go to Picovoice Console (https://picovoice.ai/console/) and sign up for an account. Use the Rhino Speech-to-Intent editor to make contexts, then train them for Raspberry Pi.

Gl0dny commented 1 month ago

ReSpeaker 4-Mic Array for Raspberry Pi | Seeed Studio Wiki

porcupine/demo/respeaker at master · Picovoice/porcupine · GitHub

GitHub - Picovoice/porcupine: On-device wake word detection powered by deep learning

Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening voice-enabled applications. It is

using deep neural networks trained in real-world environments.
compact and computationally-efficient. It is perfect for IoT.
cross-platform. Raspberry Pi, BeagleBone, Android, iOS, Linux (x86_64), macOS (x86_64), Windows (x86_64), and web browsers are supported. Additionally, enterprise customers have access to the ARM Cortex-M SDK.
scalable. It can detect multiple always-listening voice commands with no added runtime footprint.
self-service. Developers can train custom wake word models using [Picovoice Console](https://picovoice.ai/console/).

Porcupine

Functionality: Porcupine is a lightweight and efficient wake word detection engine. It specializes in recognizing wake words (keywords) but does not provide full speech recognition capabilities.
Performance: Designed for low power consumption and fast response times, making it ideal for edge devices and IoT applications where resources are limited.
Customization: Offers the ability to create custom wake words, allowing developers to define unique phrases that trigger specific actions.
Platform Support: Compatible with various platforms, including embedded systems, mobile devices, and desktop applications.
Use Cases: Best for applications focused solely on wake word detection, such as smart speakers, voice-controlled devices, and other IoT solutions.

Gl0dny commented 1 month ago

Summary

Choose Picovoice if you need a full voice AI solution that includes speech recognition and natural language processing alongside keyword detection.
Choose Porcupine if your primary focus is on efficient wake word detection for resource-constrained environments.

Both platforms are powerful in their respective areas, so the choice depends on your specific use case and requirements. If you need further insights into implementation or specific features, feel free to ask!

Gl0dny commented 1 month ago

Picovoice

Picovoice enables enterprises to innovate and differentiate rapidly with private voice AI. Build a unified AI strategy around your brand and products with our speech recognition and Natural-language understanding (NLU) technologies.

Seeed has partnered with Picovice to bring Speech Recognition solution on the edge using ReSpeaker 4 Mic for developers.

Picovoice is an end-to-end platform for building voice products on your terms. It enables creating voice experiences similar to Alexa and Google. But it entirely runs 100% on-device. There are advantages of Picovoice:

Private: Everything is processed offline. Intrinsically HIPAA and GDPR compliant.
Reliable: Runs without needing constant connectivity.
Zero Latency: Edge-first architecture eliminates unpredictable network delay.
Accurate: Resilient to noise and reverberation. It outperforms cloud-based alternatives by wide margins.
Cross-Platform: Design once, deploy anywhere. Build using familiar languages and frameworks.

Picovocie with ReSpeaker 4-Mic Array Getting Started

Step 1. Please follow the above step-to-step tutorial of ReSpeaker 4-Mic Array with Raspberry Pi before the followings.

Note: Please make sure that Audacity and the APA102 LEDs are working properly on the ReSpeaker 4-Mic Array with Raspberry Pi.

Step 2. Open Terminal and type following command to install pyaudio driver.

Example with LEDs:

pip3 install pvrespeakerdemo

picovoice_respeaker_demo

Voice Commands

Here are voice commands for this demo:

Picovoice

The demo outputs:

wake word

Turn on the lights

You should see the lights turned on and the following message in the terminal:

{
    is_understood : 'true',
    intent : 'turnLights',
    slots : {
        'state' : 'on',
    }
}

The list of commands are shown on the terminal:

context:
  expressions:
    turnLights:
      - "[switch, turn] $state:state (all) (the) [light, lights]"
      - "[switch, turn] (all) (the) [light, lights] $state:state"
    changeColor:
      - "[change, set, switch] (all) (the) (light, lights) (color) (to) $color:color"
  slots:
    state:
      - "off"
      - "on"
    color:
      - "blue"
      - "green"
      - "orange"
      - "pink"
      - "purple"
      - "red"
      - "white"
      - "yellow"

also, you can try this command to change the colour by:

Picovoice, set the lights to orange

Turn off the lights by:

Picovoice, turn off all lights

Demo Source Code

The demo is built with the Picovoice SDK. The demo source code is available on GitHub at https://github.com/Picovoice/picovoice/tree/master/demo/respeaker.

Different Wake Words

The Picovoice SDK includes free sample wake words licensed under Apache 2.0, including major voice assistants (e.g. "Hey Google", "Alexa") and fun ones like "Computer" and "Jarvis".

Custom Voice Commands

The lighting commands are defined by a Picovoice Speech-to-Intent context. You can design and train contexts by typing in the allowed grammar using Picovoice Console. You can test your changes in-browser as you edit with the microphone button. Go to Picovoice Console (https://picovoice.ai/console/) and sign up for an account. Use the Rhino Speech-to-Intent editor to make contexts, then train them for Raspberry Pi.

Porcupine

Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening voice-enabled applications. It is

using deep neural networks trained in real-world environments.
compact and computationally-efficient. It is perfect for IoT.
cross-platform. Raspberry Pi, BeagleBone, Android, iOS, Linux (x86_64), macOS (x86_64), Windows (x86_64), and web browsers are supported. Additionally, enterprise customers have access to the ARM Cortex-M SDK.
scalable. It can detect multiple always-listening voice commands with no added runtime footprint.
self-service. Developers can train custom wake word models using Picovoice Console.