enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
https://big-agi.com
MIT License
5.63k stars 1.3k forks source link

Support Web Speech API #661

Open zoollcar opened 1 month ago

zoollcar commented 1 month ago

New feature: image

TODO:

Some previous questions:

I noticed the module is modules/browser; are there better alternatives?

module has been moved to modules/browser/speech-synthesis

The Web Speech API test sentence is from the web-speech-recommended-voices project and contains a placeholder {name}.

{name }has been replaced to the name of voice

Using a Git submodule might not be the best solution.

copy en.json as a local file Languages.json. fetch voice list when select a language

vercel[bot] commented 1 month ago

@zoollcar is attempting to deploy a commit to the Enrico Pro Team on Vercel.

A member of the Team first needs to authorize it.

enricoros commented 1 month ago

Thanks for the feature. This patch is now in a state where I can review it and potentially merge it.

vercel[bot] commented 1 month ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
big-agi-open-next ✅ Ready (Inspect) Visit Preview Oct 20, 2024 9:20am
enricoros commented 1 month ago

Update: thanks for resubmitting the PR, this is definitely a higher quality code that considers the application (e.g. other modules).

I'm testing it on mobile and it's hanging a couple of times (I believe it to be a stability error with some changing react reference) and it's possibly something I can fix, but it's gonna require some time for me to check out and develop.

On the UX side, there could be some rough edges (on my android phone the High quality List doesn't do much, no matter what one chooses the experience doesn't change, and this happens for the 4 available voices as well). So there's something that I can look into to improve the UX. Why is key? Because every feature Big AGI gets the same scrutiny and UX perfection.

Thanks again, I'll follow up when I have time to check this out and review and change what needs to be changed. Let me know in the meantime if anything can improve on your side.

Screenshot_20241020_023710_Chrome.jpg

Screenshot_20241020_023626_Chrome.jpg

Screenshot_20241020_023702_Chrome.jpg

zoollcar commented 1 month ago

I'll make a abstraction(under modules/tts/, ISpeechSynthesis). The current plan is to refer to the llms module.

The refactored version will be updated these days.

zoollcar commented 1 month ago

Basically done the abstraction. What is worth mentioning is a change of UI: image Engine selection change to drop-down box for more options and mobile compatibility.

enricoros commented 1 month ago

Hi @zoollcar - just FWI - I won't have the time to merge this before the official V2 launch. I can't disclose dates, but I'll be very busy for a while. If you have a clean patch that doesn't require any work from my side, I'll see what I can do - in the meantime enjoy the fact that you're the only person with a custom big-AGI that supports multiple TTR/ASR engines.