Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.3k stars 1.96k forks source link

[BUG] WordBoundary events not firing reliably #39172

Open valerauko opened 6 months ago

valerauko commented 6 months ago

Describe the bug Using the sample code for extracting timing information from speech synthesis, I notice that occasionally WordBoundary events appear not to fire. For short text sometimes there are none at all. For longer text sometimes it cuts off in the middle.

Exception or Stack Trace n/a

To Reproduce Not reliably reproducible, so I assume it's some race condition somewhere. I couldn't reproduce it in a local env with debug logging (yet), but I've confirmed malformed output resulting from this in production environments.

Code Snippet https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/daeea5e24a5171a18cd40b94ca24a9ee5d597690/samples/java/jre/console/src/com/microsoft/cognitiveservices/speech/samples/console/SpeechSynthesisSamples.java#L704 the sample code

Expected behavior Once the SpeechSynthesisResult.getReason says it's SynthesizingAudioCompleted I expect all WordBoundary events to have been fired as well.

Screenshots n/a

Setup (please complete the following information):

If you suspect a dependency version mismatch (e.g. you see NoClassDefFoundError, NoSuchMethodError or similar), please check out Troubleshoot dependency version conflict article first. If it doesn't provide solution for the problem, please provide:

Additional context Add any other context about the problem here.

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

valerauko commented 6 months ago

I've managed to reproduce the issue and after adding some logs to the WordBoundary event handler and around the result processing, I got the following:

2024-03-12T08:40:36.008Z Starting voice synthesis
2024-03-12T08:40:37.216Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.221Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.222Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.309Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.410Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.416Z Finishing word boundaries
2024-03-12T08:40:37.612Z Finished voice synthesis c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.517Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.615Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.619Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.621Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.622Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9
2024-03-12T08:40:37.622Z Word boundary event c447b7d247bc45a49b49ddfd4793c2f9

The "Finishing word boundaries" log is fired after the result was checked to be SynthesizingAudioCompleted.

As apparent, word boundary events are getting fired after the SDK reported the processing to be complete.

If this is expected behavior, is there some other event I could listen for that reliably signals that all processing (including firing all WordBoundary events) is done?

joshfree commented 6 months ago

@samvaity could you please route this?

github-actions[bot] commented 6 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @oscholz @robch.

yulin-li commented 6 months ago

This is a little tricky, but you need to wait for the SynthesisCompleted event triggered before you stopping receiving the word boundary events.

valerauko commented 6 months ago

Does the SpeakText result return SynthesizingAudioCompleted before that event is triggered? Because I check the result for that value currently

valerauko commented 6 months ago

It seems to mainly occur in extremely CPU-starved environments.

This is really unexpected considering how on the client this is not CPU-bound at all (mostly just waiting for websocket messages from what I could see).

Would something like providing an Executor(Service) to the Client possibly improve this? So that I could pass in a Java 21 virtual-thread based Executor.