Switch to microsoft-cognitiveservices-speech-sdk for SpeechSynthesis

compulim / web-speech-cognitive-services

Polyfill Web Speech API with Cognitive Services Bing Speech for both speech-to-text and text-to-speech service.

https://compulim.github.io/web-speech-cognitive-services

MIT License

60 stars 18 forks source link

Switch to microsoft-cognitiveservices-speech-sdk for SpeechSynthesis #173

Open sbiaudet opened 2 years ago

sbiaudet commented 2 years ago

Hello Compulim,

I use through the botframework-webchat, the module web-speech-cognitive-services. I have a fully customized UX with an animated character. I need the onwordboundary event to synchronize the display of subtitles and the character animations.

Currently you are using the REST API to do the SpeechSynthesis. If you used the sdk directly for SpeechSynthesis with websocket we could have the onwordboundary event. Also we could have access to the Visem event to do lipsync.

Do you think you could use the sdk instead of the REST API?

sbiaudet commented 1 year ago

@compulim we've been working on integrating the SpeechSynthesis library's websocket-based sdk in place of REST calls.

Response times are much better and the onStart event is better synchronized. We've added support for the onBoundary event with word and viseme types

Are you open to us proposing a pull-request?

sbiaudet commented 1 year ago

@compulim little up to remind my demand. We are ready to push a pull-request. Are you ok ?

vladmaraev commented 3 weeks ago

@sbiaudet I would be very interested in this. Do you want to collaborate on this change?

sbiaudet commented 3 weeks ago

@vladmaraev I never had a response from @compulim. We've fork the repo and publish a package here https://www.npmjs.com/package/@davi-ai/web-speech-cognitive-services-davi.

I'm ready to merge here, it's idiot to maintain a fork just for this. @compulim, is that ok with you ?

vladmaraev commented 2 weeks ago

@sbiaudet That's really nice! I tried your package, but unfortunately it fails to synthesise SSML (however I can see in the generated js code that SSML is still supported)... Maybe there are some caveats? I would be happy to contribute to either a PR here or to your fork (is it public?). Many thanks for your work!