Support Web Speech API - Githubissues

enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

https://big-agi.com

MIT License

5.63k stars 1.3k forks source link

Support Web Speech API #661

Open zoollcar opened 1 month ago

zoollcar commented 1 month ago

New feature:

TODO:

Some languages have very few voices. pre-select will causing the voice list to become empty. Perhaps we should hide the pre-select option in these cases.
When switching the TTS engine, the Elecenlabs test voice doesn't stop when I switch to the Web Speech API.
The upstream project has a some broken json files, I'll fix and make a PR there

Some previous questions:

I noticed the module is modules/browser; are there better alternatives?

module has been moved to modules/browser/speech-synthesis

The Web Speech API test sentence is from the web-speech-recommended-voices project and contains a placeholder {name}.

{name }has been replaced to the name of voice

Using a Git submodule might not be the best solution.

copy en.json as a local file Languages.json. fetch voice list when select a language

vercel[bot] commented 1 month ago

@zoollcar is attempting to deploy a commit to the Enrico Pro Team on Vercel.

A member of the Team first needs to authorize it.

enricoros commented 1 month ago

Thanks for the feature. This patch is now in a state where I can review it and potentially merge it.

vercel[bot] commented 1 month ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
big-agi-open-next	✅ Ready (Inspect)	Visit Preview	Oct 20, 2024 9:20am

enricoros commented 1 month ago

Update: thanks for resubmitting the PR, this is definitely a higher quality code that considers the application (e.g. other modules).

I'm testing it on mobile and it's hanging a couple of times (I believe it to be a stability error with some changing react reference) and it's possibly something I can fix, but it's gonna require some time for me to check out and develop.

On the UX side, there could be some rough edges (on my android phone the High quality List doesn't do much, no matter what one chooses the experience doesn't change, and this happens for the 4 available voices as well). So there's something that I can look into to improve the UX. Why is key? Because every feature Big AGI gets the same scrutiny and UX perfection.

Thanks again, I'll follow up when I have time to check this out and review and change what needs to be changed. Let me know in the meantime if anything can improve on your side.

zoollcar commented 1 month ago

I'll make a abstraction(under modules/tts/, ISpeechSynthesis). The current plan is to refer to the llms module.

The refactored version will be updated these days.

zoollcar commented 1 month ago

Basically done the abstraction. What is worth mentioning is a change of UI: Engine selection change to drop-down box for more options and mobile compatibility.

enricoros commented 1 month ago

Hi @zoollcar - just FWI - I won't have the time to merge this before the official V2 launch. I can't disclose dates, but I'll be very busy for a while. If you have a clean patch that doesn't require any work from my side, I'll see what I can do - in the meantime enjoy the fact that you're the only person with a custom big-AGI that supports multiple TTR/ASR engines.