Closed nssidhu closed 11 months ago
Having the same problem running the basic and chat samples for the avatar. Speech to text etc are fine, this is specific to the launching of the RTCPeerConnection
redacted log from browser
ConsoleLoggingListener.ts:43 2023-11-28T15:51:26.027Z | peer connection: got local SDP. | privName: peer connection: got local SDP. | privEventId: redacted | privEventTime: 2023-11-28T15:51:26.027Z | privEventType: 1 | privMetadata: {}
ConsoleLoggingListener.ts:43 2023-11-28T15:51:26.028Z | SynthesisTriggeredEvent | privName: SynthesisTriggeredEvent | privEventId: redacted | privEventTime: 2023-11-28T15:51:26.028Z | privEventType: 1 | privMetadata: {} | privRequestId: redacted | privSessionAudioDestinationId: <NULL> | privTurnAudioDestinationId: <NULL>
ConsoleLoggingListener.ts:43 2023-11-28T15:51:26.028Z | ConnectingToSynthesisServiceEvent | privName: ConnectingToSynthesisServiceEvent | privEventId: redacted | privEventTime: 2023-11-28T15:51:26.028Z | privEventType: 1 | privMetadata: {} | privRequestId: redacted | privAuthFetchEventId: redacted
ConsoleLoggingListener.ts:43 2023-11-28T15:51:26.029Z | ConnectionStartEvent | privName: ConnectionStartEvent | privEventId: redacted | privEventTime: 2023-11-28T15:51:26.029Z | privEventType: 1 | privMetadata: {} | privConnectionId: redacted | privUri: wss://westus2.voice.speech.microsoft.com/cognitiveservices/websocket/v1?enableTalkingAvatar=true&Ocp-Apim-Subscription-Key=redacted&X-ConnectionId=redacted | privHeaders: <NULL>
ConsoleLoggingListener.ts:43 2023-11-28T15:51:26.738Z | ConnectionEstablishedEvent | privName: ConnectionEstablishedEvent | privEventId: redacted | privEventTime: 2023-11-28T15:51:26.738Z | privEventType: 1 | privMetadata: {} | privConnectionId: redacted
ConsoleLoggingListener.ts:43 2023-11-28T15:51:26.738Z | SynthesisStartedEvent | privName: SynthesisStartedEvent | privEventId: redacted | privEventTime: 2023-11-28T15:51:26.738Z | privEventType: 1 | privMetadata: {} | privRequestId: redacted | privAuthFetchEventId: redacted
ConsoleLoggingListener.ts:43 2023-11-28T15:51:26.739Z | ConnectionMessageSentEvent | privName: ConnectionMessageSentEvent | privEventId: redacted | privEventTime: 2023-11-28T15:51:26.739Z | privEventType: 1 | privMetadata: {} | privConnectionId: redacted | privNetworkSentTime: 2023-11-28T15:51:26.739Z | privMessage: {"privBody":"{\"context\":{\"system\":{\"name\":\"SpeechSDK\",\"version\":\"1.33.1\",\"build\":\"JavaScript\",\"lang\":\"JavaScript\"},\"os\":{\"platform\":\"Browser/Win32\",\"name\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36\",\"version\":\"5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36\"},\"synthesis\":{\"video\":{\"format\":{\"bitrate\":2000000,\"codec\":\"H264\",\"crop\":{\"bottomRight\":{\"x\":1920,\"y\":1080},\"topLeft\":{\"x\":0,\"y\":0}},\"resolution\":{\"height\":1080,\"width\":1920}},\"protocol\":{\"name\":\"WebRTC\",\"webrtcConfig\":{\"clientDescription\":\"<redacted>\",\"iceServers\":[{\"credential\":\"redacted\",\"urls\":[\"turn:relay.communication.microsoft.com:3478\"],\"username\":\"redacted\"}]}},\"talkingAvatar\":{\"background\":{\"color\":\"#FFFFFFFF\"},\"character\":\"lisa\",\"customized\":false,\"style\":\"casual-sitting\"}}}}}","privMessageType":0,"privHeaders":{"Path":"speech.config","X-RequestId":"redacted","X-Timestamp":"2023-11-28T15:51:26.738Z","Content-Type":"application/json"},"privId":"redacted","privSize":9642,"privPath":"speech.config","privRequestId":"redacted","privContentType":"application/json"}
ConsoleLoggingListener.ts:43 2023-11-28T15:51:26.739Z | ConnectionMessageSentEvent | privName: ConnectionMessageSentEvent | privEventId: redacted | privEventTime: 2023-11-28T15:51:26.739Z | privEventType: 1 | privMetadata: {} | privConnectionId: redacted | privNetworkSentTime: 2023-11-28T15:51:26.739Z | privMessage: {"privBody":"{\"synthesis\":{\"audio\":{\"metadataOptions\":{\"bookmarkEnabled\":false,\"sessionEndEnabled\":true,\"visemeEnabled\":false},\"outputFormat\":\"raw-24khz-16bit-mono-pcm\"},\"language\":{}}}","privMessageType":0,"privHeaders":{"Path":"synthesis.context","X-RequestId":"redacted","X-Timestamp":"2023-11-28T15:51:26.739Z","Content-Type":"application/json"},"privId":"redacted","privSize":172,"privPath":"synthesis.context","privRequestId":"redacted","privContentType":"application/json"}
ConsoleLoggingListener.ts:43 2023-11-28T15:51:26.740Z | ConnectionMessageSentEvent | privName: ConnectionMessageSentEvent | privEventId: redacted | privEventTime: 2023-11-28T15:51:26.740Z | privEventType: 1 | privMetadata: {} | privConnectionId: redacted | privNetworkSentTime: 2023-11-28T15:51:26.740Z | privMessage: {"privBody":"<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xmlns:emo='http://www.w3.org/2009/10/emotionml' xml:lang='en-US'><voice name='en-US-JennyNeural'></voice></speak>","privMessageType":0,"privHeaders":{"Path":"ssml","X-RequestId":"redacted","X-Timestamp":"2023-11-28T15:51:26.740Z","Content-Type":"application/ssml+xml"},"privId":"redacted","privSize":221,"privPath":"ssml","privRequestId":"redacted","privContentType":"application/ssml+xml"}
basic.js:106 [2023-11-28T15:51:28.085Z] Avatar failed to start. Error: InvalidCharacterError: Failed to execute 'atob' on 'Window': The string to be decoded is not correctly encoded.```
The OP error above is quoted as being from the "yinhew" branch, but I had this problem on the master branch of the repo, using the file samples/js/browser/avatar/basic.html, using both chrome and edge on windows
Problem appears to be that it's not available outside the S0 tier: I dug a little bit and found this message: "Protocols.Core.BadClientRequestException: Avatar is currently only available on Standard S0 resource websocket error code: 1011"
@Azure people: Not loving the fact that I had to dig this hard to find the cause of the error. Any chance you could do a better job of surfacing the error here ? I had to put a breakpoint on one of the packed SDK libs and trace an async call and then interrogate a response that was missing a property to even find this:
I think we can all agree this is kind of a crap user experience. Even just a very prominent note saying "you'll need to be off the free tier to get this to work" would be welcome
Can confirm this works, but needs the following extra details for the speech service:
Thanks for reporting this issue. The error message is not accurate for now.
The avatar feature support in SDK is still in experimental
stage and we will update it in following releases.
I am running S0 tier(not the free one) and still encountering the error.
https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/yinhew/avatar/samples/js/browser/avatar I am trying out the Browser version of this and getting the following error.
[2023-11-28T14:19:14.306Z] Avatar failed to start. Error: InvalidCharacterError: Failed to execute 'atob' on 'Window': The string to be decoded is not correctly encoded.
IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:
Speech SDK log taken from a run that exhibits the reported issue. See instructions on how to take logs.
A stripped down, simplified version of your source code that exhibits the issue. Or, preferably, try to reproduce the problem with one of the public samples in this repository (or a minimally modified version of it), and share the code.
If relevant, a WAV file of your input audio.
Additional information as shown below
Describe the bug
A clear and concise description of what the bug is. If things are not working as you expect, describe exactly what you are getting and why that is not what you expect. For example, speech recognition "does not work" may mean you got a cancellation event with a particular error message, or you did not get any recognition events, or the recognition result you got contains text that does not match what was spoken.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Version of the Cognitive Services Speech SDK
Which version of the SDK are you using.
Platform, Operating System, and Programming Language
Additional context