danny-avila / LibreChat

Enhanced ChatGPT Clone: Features Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. Actively in public development.
https://librechat.ai/
MIT License
19.44k stars 3.24k forks source link

Enhancement: "push to talk" and keyboard shortcuts for easier voice prompting (STT) #4807

Open danielrosehill opened 2 days ago

danielrosehill commented 2 days ago

What features would you like to see added?

Hey!

I would really love to begin prompting by speech (ie, using voice recognition)

If it would be of interest, I'd also like to contribute some documentation around the various STT features as I couldn't find the parameters covered in the STT page.

image

Specifically: what does "conversation mode" toggle on and "auto transcribe audio".

I have a couple of ideas for this which I'm batching under one feature enhancement with the intention of looking into the feasibility of trying to work on these myself:

The second feature is really just a workaround for what I find to be the main frustration of STT and which is specifically challenging when trying to use it for prompting: the automatic cutoffs / pause detection. I don't know if this is baked into the engine or if it's a parameter that can be adjusted. But it would be really helpful to increase the buffer time to a few seconds so that users had time to think about what they want to instruct.

More details

I think the above pretty much covers it!

I'm possibly in the minority of LLM users who feel this way, but I find the idea of voice prompting much more potentially useful than having real time chats with LLMs (ie, simultaneous STT and TTS). I mean, it would be nice to have both. But if I had to choose, voice prompting would actually speed up my workflow the most!

Which components are impacted by your request?

General, UI

Pictures

No response

Code of Conduct