I would really love to begin prompting by speech (ie, using voice recognition)
If it would be of interest, I'd also like to contribute some documentation around the various STT features as I couldn't find the parameters covered in the STT page.
Specifically: what does "conversation mode" toggle on and "auto transcribe audio".
I have a couple of ideas for this which I'm batching under one feature enhancement with the intention of looking into the feasibility of trying to work on these myself:
Hotkey support to start and stop voice detection to facilitate (almost) hands-free usage
Some implementation of "push to talk" mode ... hold down an icon (e.g. the mic button) until you're ready to send.
The second feature is really just a workaround for what I find to be the main frustration of STT and which is specifically challenging when trying to use it for prompting: the automatic cutoffs / pause detection. I don't know if this is baked into the engine or if it's a parameter that can be adjusted. But it would be really helpful to increase the buffer time to a few seconds so that users had time to think about what they want to instruct.
More details
I think the above pretty much covers it!
I'm possibly in the minority of LLM users who feel this way, but I find the idea of voice prompting much more potentially useful than having real time chats with LLMs (ie, simultaneous STT and TTS). I mean, it would be nice to have both. But if I had to choose, voice prompting would actually speed up my workflow the most!
Which components are impacted by your request?
General, UI
Pictures
No response
Code of Conduct
[X] I agree to follow this project's Code of Conduct
What features would you like to see added?
Hey!
I would really love to begin prompting by speech (ie, using voice recognition)
If it would be of interest, I'd also like to contribute some documentation around the various STT features as I couldn't find the parameters covered in the STT page.
Specifically: what does "conversation mode" toggle on and "auto transcribe audio".
I have a couple of ideas for this which I'm batching under one feature enhancement with the intention of looking into the feasibility of trying to work on these myself:
The second feature is really just a workaround for what I find to be the main frustration of STT and which is specifically challenging when trying to use it for prompting: the automatic cutoffs / pause detection. I don't know if this is baked into the engine or if it's a parameter that can be adjusted. But it would be really helpful to increase the buffer time to a few seconds so that users had time to think about what they want to instruct.
More details
I think the above pretty much covers it!
I'm possibly in the minority of LLM users who feel this way, but I find the idea of voice prompting much more potentially useful than having real time chats with LLMs (ie, simultaneous STT and TTS). I mean, it would be nice to have both. But if I had to choose, voice prompting would actually speed up my workflow the most!
Which components are impacted by your request?
General, UI
Pictures
No response
Code of Conduct