Picovoice / picovoice

On-device voice assistant platform powered by deep learning
Apache License 2.0
553 stars 109 forks source link

Hardware for just Picovoice to run #782

Closed keatonrodgers07 closed 6 months ago

keatonrodgers07 commented 6 months ago

We are using Rhino and Porcupine for our project to design our own embedded system so we are trying to figure out what hardware on the PicoVoice STM32F407G-DISC1 devboard we need. We are using the STM32F407VGT6U uC from the STM32F407G-DISC1 devboard for this. What other hardware on the STM32F407G-DISC1 devboard is necessary for just PicoVoice operation? Thanks!

mrrostam commented 6 months ago

Porcupine and Rhino use the built-in flash memory and RAM on STM32F407VGT6. So, the engines themselves don't need any additional components. You just need to pass the right PCM audio to them somehow

mrrostam commented 6 months ago

This is also true for Picovoice

damian-666 commented 6 months ago

A suggestion when i worked in audio EQ all the DSP and kernel base filters where done in FixedPoint for audio realtime processing, its much cheaper, less power, heat, and standard practice, so if its not, and it appears not to be, it might be a consideration. Intel chips have DSP on them The ARM chips coming out soon for windows wil have DPS on them o via FPGAs and should compete with the Apple silicon.

another suggestion is putting a small dynamic mic / preamp spec together , maybes with the Hotword with either jaw wakeup inertial switch, or just a button toggle switch, maybe anther couple buttons mappable or pure tone buttons so you don't have to sit all the time or press the stupid mic button, or send, that's implemented differently in every different app and assistant on desktops.

for me it makes 60% go to 95% on window voice access dictation without a context awareness RAG post or preprocess..

https://source.android.com/docs/automotive/voice/voice_interaction_guide/app_development

Its pretty terrible on a condenser mic or on the mic array on the laptop.
but its low latency, offline, and with on my preamp and mic which is a rockstar type Shure SM-58 standard vocal mic.,directional, is pretty amazing.

ideally id want a wearable dynamic mic , clip on, because holding the mic causes strain the first day . the dynamic mic naturally passes only the mid range , high frequency noise that the condenser carry, the lisp ,and any other filtering FIR or EQ you would make it complex i'd guess, if you passing to another deconvolving processor. its great for dictation but for a small set of hotwords, not using context makes it weak

at least for dictation, for edition or command query complete, it awful. someone, hint should make a universal start menu for all apis and Os 's that indexes the UI , via testing harness or mouse /OCR, and capture tooltip even, then build a dictionary for voice to intents and basically unify all the Linux flavors , mac and PCs, all apps, games and the Start /Os file manager , at least to so degree. I'd partially speced it and pitched it around I don't know who will pick it up but It seems like accessibility is a ton of duplicate work x number platforms x number of aps, when i can be on NetCore Avalonia ( Voice commander) , ( 2way) like code /feature search and puts Alpha orA , B or Bravo, Charlie as ways to hit the completion was you say enough of whats on the UI. I can whisper to it with fans blowing and music playing and it works well. not sure if i pitched this here before i don't think i did.. maybesto a MSFT group , i didnt get a "great idea" roger that, so ill pitch it to you.

Start