Open valerauko opened 6 months ago
I've managed to reproduce the issue and after adding some logs to the WordBoundary event handler and around the result processing, I got the following:
2024-03-12T08:40:36.008Z Starting voice synthesis
2024-03-12T08:40:37.216Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.221Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.222Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.309Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.410Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.416Z Finishing word boundaries
2024-03-12T08:40:37.612Z Finished voice synthesis c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.517Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.615Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.619Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.621Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.622Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.622Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
The "Finishing word boundaries" log is fired after the result was checked to be SynthesizingAudioCompleted.
As apparent, word boundary events are getting fired after the SDK reported the processing to be complete.
If this is expected behavior, is there some other event I could listen for that reliably signals that all processing (including firing all WordBoundary events) is done?
@samvaity could you please route this?
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @oscholz @robch.
This is a little tricky, but you need to wait for the SynthesisCompleted
event triggered before you stopping receiving the word boundary events.
Does the SpeakText result return SynthesizingAudioCompleted before that event is triggered? Because I check the result for that value currently
It seems to mainly occur in extremely CPU-starved environments.
This is really unexpected considering how on the client this is not CPU-bound at all (mostly just waiting for websocket messages from what I could see).
Would something like providing an Executor(Service) to the Client possibly improve this? So that I could pass in a Java 21 virtual-thread based Executor.
Describe the bug Using the sample code for extracting timing information from speech synthesis, I notice that occasionally WordBoundary events appear not to fire. For short text sometimes there are none at all. For longer text sometimes it cuts off in the middle.
Exception or Stack Trace n/a
To Reproduce Not reliably reproducible, so I assume it's some race condition somewhere. I couldn't reproduce it in a local env with debug logging (yet), but I've confirmed malformed output resulting from this in production environments.
Code Snippet https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/daeea5e24a5171a18cd40b94ca24a9ee5d597690/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechSynthesisSamples.java#L704 the sample code
Expected behavior Once the SpeechSynthesisResult.getReason says it's SynthesizingAudioCompleted I expect all WordBoundary events to have been fired as well.
Screenshots n/a
Setup (please complete the following information):
If you suspect a dependency version mismatch (e.g. you see
NoClassDefFoundError
,NoSuchMethodError
or similar), please check out Troubleshoot dependency version conflict article first. If it doesn't provide solution for the problem, please provide:mvn dependency:tree -Dverbose
)Additional context Add any other context about the problem here.
Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report