SEPIA-Framework / sepia-docs

Documentation and Wiki for SEPIA. Please post your questions and bug-reports here in the issues section! Thank you :-)
https://sepia-framework.github.io/
236 stars 16 forks source link

whisper as stt engine #240

Open royrogermcfreely opened 7 months ago

royrogermcfreely commented 7 months ago

Is your feature request related to a problem? Please describe. no

Describe the solution you'd like use the whisper stt engine within sepia

Additional context Home Assistant got the "Year of the Voice". there you can use whisper on an rpi4. i tried it on a vm and got prettey good results

it seems there are 2 versions, didnt searched much about the diffrences

whisper: https://github.com/openai/whisper

fast-whisper: https://github.com/SYSTRAN/faster-whisper <- this one uses home assistant

i found also a docker image from rhasspy: https://hub.docker.com/r/rhasspy/wyoming-whisper

fquirin commented 2 months ago

Sorry for the late reply. I'm currently still taking a little break from the project, but I'm determined to resume work later this year.

As for whisper I actually have a working beta version. Unfortunately I did not finish the release before I took a break, but it was already working pretty well. As soon as I resume work, this will be the first task.

fquirin commented 2 months ago

Two additional things I should mention.

1) Whisper is pretty demanding for STT. The smallest model will run fast enough on a Raspberry Pi 5 to get OK user experience, but isn't very accurate. The larger models will require better hardware and tend to hallucinate quite a bit. Nevertheless support will come for everyone to play around with their favorite service ^^.

2) I've also made a PoC for Nvidia NeMo. My hope is that their models will evolve pretty quickly with better support for custom vocabulary. We'll see.