Gl0dny / hexapod

This project involves the design and development of a six-legged (hexapod) walking robot, with a focus on implementing a real-time gait generation algorithm in Rust. It also integrates a microphone array using Python to support audio processing concepts like Direction of Arrival (DOA), beamforming, and keyword spotting (KWS).
0 stars 0 forks source link

Issue 26: KWS #35

Open Gl0dny opened 7 hours ago

Gl0dny commented 7 hours ago
Gl0dny commented 7 hours ago

KWS : Custom one: Porcupine https://picovoice.ai/docs/ Continous speech recognition: Vosk https://github.com/alphacep/vosk-api

Gl0dny commented 7 hours ago

ReSpeaker 4-Mic Array for Raspberry Pi | Seeed Studio Wiki

picovoice/demo/respeaker at master · Picovoice/picovoice · GitHub

GitHub - Picovoice/picovoice: On-device voice assistant platform powered by deep learning

Picovoice enables enterprises to innovate and differentiate rapidly with private voice AI. Build a unified AI strategy around your brand and products with our speech recognition and Natural-language understanding (NLU) technologies.

Seeed has partnered with Picovice to bring Speech Recognition solution on the edge using ReSpeaker 4 Mic for developers.

Picovoice is an end-to-end platform for building voice products on your terms. It enables creating voice experiences similar to Alexa and Google. But it entirely runs 100% on-device. There are advantages of Picovoice:

Private: Everything is processed offline. Intrinsically HIPAA and GDPR compliant.
Reliable: Runs without needing constant connectivity.
Zero Latency: Edge-first architecture eliminates unpredictable network delay.
Accurate: Resilient to noise and reverberation. It outperforms cloud-based alternatives by wide margins.
Cross-Platform: Design once, deploy anywhere. Build using familiar languages and frameworks.

Picovoice

Functionality: Picovoice is a comprehensive voice AI platform that offers speech recognition, keyword spotting, and natural language understanding capabilities. It allows developers to build custom voice interfaces for various applications.
Components: It includes tools like Picovoice Console for managing voice models, Picovoice SDKs for various platforms, and built-in integration with popular cloud services.
Customization: Picovoice provides options to create custom wake words, enabling developers to tailor the voice recognition experience to specific applications.
Multi-Language Support: It supports multiple languages for both wake word detection and speech recognition.
Use Cases: Suitable for applications that require both keyword spotting and full speech recognition, such as smart home devices, voice assistants, and interactive voice applications.

The lighting commands are defined by a Picovoice Speech-to-Intent context. You can design and train contexts by typing in the allowed grammar using Picovoice Console. You can test your changes in-browser as you edit with the microphone button. Go to Picovoice Console (https://picovoice.ai/console/) and sign up for an account. Use the Rhino Speech-to-Intent editor to make contexts, then train them for Raspberry Pi.

Gl0dny commented 6 hours ago

ReSpeaker 4-Mic Array for Raspberry Pi | Seeed Studio Wiki

porcupine/demo/respeaker at master · Picovoice/porcupine · GitHub

GitHub - Picovoice/porcupine: On-device wake word detection powered by deep learning

Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening voice-enabled applications. It is

using deep neural networks trained in real-world environments.
compact and computationally-efficient. It is perfect for IoT.
cross-platform. Raspberry Pi, BeagleBone, Android, iOS, Linux (x86_64), macOS (x86_64), Windows (x86_64), and web browsers are supported. Additionally, enterprise customers have access to the ARM Cortex-M SDK.
scalable. It can detect multiple always-listening voice commands with no added runtime footprint.
self-service. Developers can train custom wake word models using [Picovoice Console](https://picovoice.ai/console/).

Porcupine

Functionality: Porcupine is a lightweight and efficient wake word detection engine. It specializes in recognizing wake words (keywords) but does not provide full speech recognition capabilities.
Performance: Designed for low power consumption and fast response times, making it ideal for edge devices and IoT applications where resources are limited.
Customization: Offers the ability to create custom wake words, allowing developers to define unique phrases that trigger specific actions.
Platform Support: Compatible with various platforms, including embedded systems, mobile devices, and desktop applications.
Use Cases: Best for applications focused solely on wake word detection, such as smart speakers, voice-controlled devices, and other IoT solutions.
Gl0dny commented 6 hours ago

Summary

Choose Picovoice if you need a full voice AI solution that includes speech recognition and natural language processing alongside keyword detection.
Choose Porcupine if your primary focus is on efficient wake word detection for resource-constrained environments.

Both platforms are powerful in their respective areas, so the choice depends on your specific use case and requirements. If you need further insights into implementation or specific features, feel free to ask!