V-Sekai / godot-whisper

An GDExtension addon for the Godot Engine that enables realtime audio transcription, supports OpenCL for most platforms, Metal for Apple devices, and runs on a separate thread.
MIT License
69 stars 7 forks source link

VOSK option? #71

Closed darkhog closed 6 months ago

darkhog commented 6 months ago

AFAIK the VOSK models offer both the faster speech recognition and the better accuracy (recognizing punctuation for example). Though a separate addon may be needed.

gudrob commented 6 months ago

https://github.com/unusualprojects/GodotSpeechRecognition needs only a few adaptions to work in Godot 4. You can grab the excellent vosk-model-en-us-0.22-lgraph from https://alphacephei.com/vosk/models The big problem for me was that it does not recognize names without retraining. A custom grammar solution wouldnt work.

I don't think adding it to a whisper project would make much sense?

Ughuuu commented 6 months ago

I also don't think it makes sense to add it, as it's a different thing. If I just use whisper, I wouldn't want vosk to add to the size of this addon. Also since the other one exists, it can be used(haven't use it)