Azure-Samples / Cognitive-Speech-STT-Android

Android SDK for the Microsoft Speech-to-Text API, part of Cognitive Services
https://www.microsoft.com/cognitive-services/en-us/speech-api
Other
102 stars 58 forks source link

DataRecognitionClient is slow with results (6 secs or greater) #61

Open BeanStalka opened 6 years ago

BeanStalka commented 6 years ago

When using the DataRecognitionClient it is taking quite a long time to get any results. On Average 6 seconds.

The first call is always slow.

If I make another call in rapid succession it will return in around a second.

If I wait more than 30 seconds between calls they will both average 6 seconds from when I call SendAudio to when OnFinalResponseReceived is called.

Steps to reproduce:

1.) Instanatiate a DataRecognitionClient.
dataRecognitionClient = SpeechRecognitionServiceFactory.CreateDataClient(SpeechRecognitionMode.ShortPhrase, Locale, this, ApiKey);

2.) Supply it an audio format through the DataRecognitionClient via SendAudioFormat var audioFormat = SpeechAudioFormat.Create16BitPCMFormat(16000); dataRecognitionClient.SendAudioFormat(audioFormat);

3.) Send raw PCM data through the DataRecognitionClient via SendAudio dataRecognitionClient.SendAudio(rawData, rawData.Length);

4.) Time how long it takes to hit the OnFinalResponseReceived call back.

If I had to guess its an issues opening and closing web sockets. This is just a guess though.

BeanStalka commented 6 years ago

here are the logcats: one from a slow request and one from a fast request.

of particular interest is the section of the logcat in the slow file: It wastes 5 secs after 'sending audio stream end point'

10-24 16:47:59.463 Samsung SM-T713 Info 17248 NLPLAT ConnectionStateChanged 10-24 16:47:59.462 Samsung SM-T713 Info 17248 NLPLAT CU Client connected 10-24 16:47:59.462 Samsung SM-T713 Info 17248 NLPLAT Web socket handshake completed 10-24 16:47:59.462 Samsung SM-T713 Info 17248 NLPLAT Upgrade request returned with HTTP status code: 101. 10-24 16:47:59.340 Samsung SM-T713 Info 17248 NLPLAT
10-24 16:47:54.246 Samsung SM-T713 Info 17248 NLPLAT Sent audio stream endpoint,
requestId='S' 10-24 16:47:54.246 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.246 Samsung SM-T713 Info 17248 NLPLAT Sending audio stream endpoint, requestId='S' 10-24 16:47:54.246 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.244 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.243 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.242 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.240 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.238 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.237 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.235 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.233 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.231 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.230 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.228 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.226 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.224 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.223 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.221 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.219 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.218 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.214 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.212 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.210 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.208 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.206 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.205 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion. 10-24 16:47:54.203 Samsung SM-T713 Info 17248 NLPLAT Waiting for connection/send completion.

fast.txt slow.txt

raaaar commented 6 years ago

How long is the audio? What's the delay to the first partial result?

The speech service is designed and optimized for the recognition of audio spoken by humans “in real time” (“in real time” = recognizing audio that is streamed to the service at the rate that a human can speak that audio). The fastest it can process the bulk audio is ca. 2x the playback speed. If audio size if ~ 12 seconds, the final results will not be fired sooner than 6 seconds after the audio has been uploaded.

raaaar commented 6 years ago

Looking closer, it takes 6 seconds to receive the Upgrade from the service. Weird.

The code snippet above does not look like valid java. Could you try the unmodified sample and verify that you still experience the delay?

BeanStalka commented 6 years ago

The audio is approx 5 secs long.

Previously I was able to get sub second responses from the service with no problem.

That code snipit was not meant to be valid java. Rather it was an example of how I am implementing the SDK.

I have downloaded the sample and ran it with the same results.

Can you do the same and verify that there is an issue?

Do you have any idea why the upgrade request would take so long? I am using the speechrecognition-1.2.2.aar just like the demo.

I was reading the web sockets doc and a couple things stood out. Does the client SDK handle the requirements laid out for web sockets in this doc? [(https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/websocketprotocol)]

Specifically 1.) Does it supply a UUID connection ID for the WebSocket Handshake request (upgrade request) 2.) Does it use TLS 1.2? 3.) Does it support cookies?
4.) Does it support Http redirection? 5.) Does it properly report connection errors to the service?

Is there some way to crank up the logging coming from the NPLAT in logcat? I am trying to verify that all of the above criteria is met to zero in on the bug.

If the sdk supports cookies, how is it doing so. Do I need some other permission in my app in order to store the cookie?

I'm pretty sure that second faster attempt mentioned in my initial post is being considered a retry from the web sockets point of view and is using the first, slower initializing connection.

raaaar commented 6 years ago

Documented requirements apply to the new protocol. The SDK is built on top of a legacy protocol.

Second retry is faster, because it uses the same WS connection (which expires in 30 seconds).

I cannot repro the issue. Looking at the service logs.

BeanStalka commented 6 years ago

Please keep in mind that while the demo app works when downloaded and the keys are replaced, the issue we are attempting to solve is the slow connection/response/translation.

IOW, pulling down the android demo app, and entering my app key then running it will get you a very slow response when you click the speech to text button

BeanStalka commented 6 years ago

@raaaar any progress on this issue?

raaaar commented 6 years ago

Nothing yet... It's definitely a service issue, will escalate.

BeanStalka commented 6 years ago

Thank you for the update.

Once its fixed, let me know and I will be happy to test. ;)

Thanks Again

raaaar commented 6 years ago

@BeanStalka Could you please try this JavaScript sample and see if you can reproduce the issue? If you do, the log from the browser console would help a lot to get to the bottom of this.

BeanStalka commented 6 years ago

The javascript version of the web sockets is much faster. { "RecognitionStatus": "Success", "DisplayText": "What's the weather like in Philadelphia Pennsylvania?", "Offset": 2700000, "Duration": 27400000 }

that looks and feels like a 400ms round trip.

The speeds on my android tablet have gotten better. they have gone from 6 seconds every time to around 4 seconds.

Attached please find the latest logcats for a fast and slow response

newslow.txt newfast.txt

raaaar commented 6 years ago

Turns out, there's a bug in how the non-windows sdk code converts guids to strings and prints them out. In place of all the 'S' symbols in the logcat output ("NLPLAT ImpressionId: S", "Response request id: 'S'", etc.) there should've been proper guids. As a result, the Android doesn't log metadata it receives from the server and whatever it logs is not logged on the server side (at least, I wasn't able to get anywhere based on the logcat info).

I know that this is getting beyond ridiculous, but can you try the windows sample (https://github.com/Azure-Samples/Cognitive-Speech-STT-Windows), which does log server replies and if you see a delay there, post the Response Impression Id?

BeanStalka commented 6 years ago

I'm confused.

The newslow.txt I attached contains this:

10-27 12:09:41.957 Samsung SM-T713 Info 7679 NLPLAT Create ImpressionId: b6221717220a668d61182ca5d4ed1087

It would seem you have an impression id to investigate the issues I am seeing with the Android SDK.

P.S. - I downloaded and tried the windows demo with no issues. The translation was snappy and there was hardly any wait.

Please advise

raaaar commented 6 years ago

10-27 12:09:41.957 Samsung SM-T713 Info 7679 NLPLAT Create ImpressionId: b6221717220a668d61182ca5d4ed1087

This is a request impression id, (guid is generated locally by the client). For some obscure historical reason, this is not logged by the server side (or, not in a way to make the lookup straight-forward). Server does log the response metainfo (impression id, response id, etc). But on the client side, there's a bug in the parsing procedure, thereby all id info is rendered useless ('S' vs proper guid).

The whole thing is, unfortunately, confusing. At any rate, raised a ticket, hoping that the service team will be able to triage based on the available info.

What timezone are you based in?

BeanStalka commented 6 years ago

Raaaar,

I am in the Eastern United States time zone.

When you get a ticket going, how can I track its progress?

Thanks,

Bean

BeanStalka commented 6 years ago

Raaaar, I am seeing this issue again. Any update to the SDK?

autumnleaf666 commented 5 years ago

Hi, @BeanStalka did you solve this issue? I am experiencing the same issue here but for more than 20s. Thank you for any input.