Closed NigelHiggs30 closed 6 months ago
I think your request can be put into two different points.
@drmfinlay could speak possibly to the documentation but the code can be found for each engine supported thus far at https://github.com/dictation-toolbox/dragonfly/tree/master/dragonfly/engines
Most engines have a middleware outside of dragonfly that handles compiling grammars from dragonfly down to engine specific implementation and specs. Examples of this middleware are Natlink and Kaldi Active Grammar
This has previously been discussed in the following issue. https://github.com/dictation-toolbox/dragonfly/issues/376
That is to say doesn't mean it can't be done however there doesn't seem to be a clear path that's performant within the whisper API and possibly a limitation within the model itself.
@NigelHiggs30 This looks interesting https://github.com/facebookresearch/seamless_communication
Hello Nigel,
Thank you for opening this issue. I apologise for my late reply. This issue fell off my radar.
As @LexiconCode has mentioned above, support for Whisper has been discussed previously. Whisper is impressive, but not useful for everything. It simply is not an appropriate tool for this particular job. I went into the details in #376 and elsewhere (I think).
As for the documentation, it is in need of updating. I am not considering the addition of new engines within Dragonfly any more. The engines we have at the moment are quite sufficient, in my opinion. A new engine could be implemented and used externally, however. One should only need to register an engine instance using the register_engine_init()
function for things to work properly.
I've been following this project for several years and previously interacted with it using the built-in Windows speech recognition engine. The core project is impressive, but the limitations were primarily with the speech recognition engines available at that time. I believe today is the time to upgrade this project. Refactoring might be necessary for broader applicability, but the potential of the final product is significant. The primary barrier to wider adoption was the capabilities of the engines used previously. With the advancements in open-source AI and voice-to-text technologies, especially with developments like Whisper models, this project has the potential to reach new heights of performance and usability. Are there any updated documentation or support for integrating new engines, particularly Whisper models? I am considering initiating a pull request to integrate these advancements into the project.