coleam00 / bolt.new-any-llm

Prompt, run, edit, and deploy full-stack web applications using any LLM you want!
https://bolt.new
MIT License
3.89k stars 1.59k forks source link

Added speech to text capability #275

Open navyseal4000 opened 1 week ago

navyseal4000 commented 1 week ago

Verify your system default microphone is the one you're testing with, as that's the primary limitation of this initial text to speech implementation for voice prompting.

chrismahoney commented 1 week ago

Awesome! Once we settle in on some provider work this is on my radar for when we've got room for feature adds. 👍

wonderwhy-er commented 1 week ago

I actually pulled it and merged with provider work. There are conflicts but minimal ones, it can go in parallel.

Tested here and seems to work. There is one thing I would fix before merging https://www.youtube.com/watch?v=3Gc0yOgx-EQ

When user submits we should clean out ongoing text so that when he speaks next time its new text.

There are some other potential UX changes. I would love to play with allowing it to do commands. Aka "stop/start/submit" so I can control it with my voice only.

Use "Hello Google/Alexa" style wake up and go to sleep commands?

Ugh so exciting, thanks for your great work @navyseal4000

milutinke commented 1 week ago

Oof, I started working on #281 before this was created.

milutinke commented 1 week ago

@wonderwhy-er Can we combine the two so it's not time wasted? Maybe use this when available and mine when the browser doesn't support this.

wonderwhy-er commented 1 week ago

Oof, I started working on #281 before this was created.

that is why I am for posting PRs early in draft mode for early feedback. I shared in the group that my policy is to review ones who added PR earlier first.

navyseal4000 commented 1 week ago

Won't have time this morning to get the enhancements done, I'll try to get to them later tonight

navyseal4000 commented 1 week ago

In fact if you'd like, feel free to finish this @milutinke and if you can, feel free to take the author role. Just lmk if you do so we don't both work on it tonight

milutinke commented 1 week ago

In fact if you'd like, feel free to finish this @milutinke and if you can, feel free to take the author role. Just lmk if you do so we don't both work on it tonight

I had quite a busy day, sorry for the late reply, I won't have the time to do anything at least up to 22. of November. Wrote a proposal to Eduard in my PR, waiting for his reply, but I'd say you can finish this, and if he agrees, I would pull your changes and then add my option as the fallback. Great job btw.

Edit: PS: Maybe add a better indicator when the user is recording, just to make clear to them, maybe even like grab my code and adapt it to use that animation when speaking.

chrismahoney commented 1 week ago

Voice support is an awesome feature to support, I’m really biased toward it from a human computer interface perspective so just full disclosure.

If we can all work together towards integration of this and #281 I am more than happy to help out where I can. Cheers!

Quick Edit: I’ll qualify this by saying this feature may end up on the roadmap in a certain priority, so please don’t see that as a disappointment; just making sure all the wheels roll in the same direction. 🤓

wonderwhy-er commented 1 week ago

I am pretty excited for this feature and already tested @milutinke solution that it works pretty well with one exception.

I am honestly willing to merge this and then add those improvements as separate PRs so that ball is rolling.

May be will get to this in the evening

And we can add other things in separate PRs

navyseal4000 commented 1 day ago

@wonderwhy-er Ready for retesting. I got the fix working, but not the additional features with voice mode yet. Adding "enhance voice mode" as a goal in the readme might be a good idea, but I'll leave that determination up to you.