Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.33k stars 1.97k forks source link

[QUERY] Setting Up Text To Speech Service With Private Endpoint Results With 404 #28530

Closed ObakeFilter closed 2 years ago

ObakeFilter commented 2 years ago

Query/Question Hi all, I am trying to deploy a Java application that uses the Speech SDK for TTS conversion, working with a common Azure regions (US West) works great, but switching to use a private endpoint results in HTTP error 404 being returned from the remote Speech Service I've opened in my Azure account.

I'm setting up the SpeechConfig instance using the following pretty straightforward method.

    private static final String ENDPOINT_TEMPLATE = "wss://%s.cognitiveservices.azure.com";

    private SpeechSynthesizer setupSpeechSynthesizer(SpeechServicesProperties.Engine engine, String key, String customDomainName,
                                                     SpeechSynthesisOutputFormat outputFormat) {
        log.info("Creating speech synthesizer for language {} with speaker {}...", engine.getLanguage(), engine.getVoiceName());
        SpeechConfig speechConfig = SpeechConfig.fromHost(URI.create(String.format(ENDPOINT_TEMPLATE, customDomainName)), key);
        speechConfig.setSpeechSynthesisLanguage(engine.getLanguage());
        speechConfig.setSpeechSynthesisVoiceName(engine.getVoiceName());
        speechConfig.setSpeechSynthesisOutputFormat(outputFormat);
        return new SpeechSynthesizer(speechConfig, null);
    }

But the convesion fails and the following error log is being printed out.

[600859]: 8380ms SPX_DBG_TRACE_VERBOSE:  usp_tts_engine_adapter.cpp:169 SSML sent to TTS cognitive service: <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xmlns:emo='http://www.w3.org/2009/10/emotionml' xml:lang='he-IL'><voice name='he-IL-HilaNeural'>שיבוטה</voice></speak>
[600859]: 8380ms SPX_DBG_TRACE_VERBOSE:  usp_tts_engine_adapter.cpp:447 CSpxUspTtsEngineAdapter::UspInitialize: this=0x0000000001491BA0
[600859]: 8380ms SPX_DBG_TRACE_VERBOSE:  named_properties.h:364 ISpxNamedProperties::GetStringValue: this=0x00000000014A7C08; name='SPEECH-SubscriptionKey'; value='********************************'
[600859]: 8380ms SPX_DBG_TRACE_VERBOSE:  named_properties.h:364 ISpxNamedProperties::GetStringValue: this=0x0000000001491C08; name='AZAC-SDK-PROGRAMMING-LANGUAGE'; value='Java'
[600859]: 8380ms SPX_DBG_TRACE_VERBOSE:  resource_manager.cpp:95 Created 'CSpxUspCallbackWrapper' as '978711522'
[600859]: 8380ms SPX_DBG_TRACE_VERBOSE:  named_properties.h:364 ISpxNamedProperties::GetStringValue: this=0x00000000014A7C08; name='SPEECH-Host'; value='wss://XXX.cognitiveservices.azure.com'
[600859]: 8380ms SPX_DBG_TRACE_VERBOSE:  usp_tts_engine_adapter.cpp:572 CSpxUspTtsEngineAdapter::SetUspEndpoint: Using custom host: wss://XXX.cognitiveservices.azure.com
[600859]: 8381ms SPX_DBG_TRACE_VERBOSE:  named_properties.h:364 ISpxNamedProperties::GetStringValue: this=0x00000000014A7C08; name='SPEECH-ProxyHostBypass'; value=''
[600859]: 8381ms SPX_DBG_TRACE_VERBOSE:  resource_manager.cpp:95 Created 'CSpxUspConnection' as '792041254'
[600859]: 8381ms SPX_TRACE_INFO:  usp_connection.cpp:455 Microsoft::CognitiveServices::Speech::USP::CSpxUspConnection::Connect: entering...
[600859]: 8381ms SPX_TRACE_INFO:  usp_connection.cpp:472 Adding subscription key headers
[600859]: 8381ms SPX_TRACE_INFO:  usp_connection.cpp:507 Set a user defined HTTP header 'User-agent':'SpeechSDK-Java/1.20.0 Windows Client 10'
[600859]: 8381ms SPX_TRACE_INFO:  usp_connection.cpp:513 Set an underlying io option 'tcp_nodelay'
[600859]: 8381ms SPX_TRACE_INFO:  usp_connection.cpp:522 connectionUrl=wss://XXX.cognitiveservices.azure.com/cognitiveservices/websocket/v1
[600859]: 8381ms SPX_DBG_TRACE_SCOPE_ENTER:  web_socket.cpp:165 CSpxWebSocket::CSpxWebSocket
[600859]: 8381ms SPX_DBG_TRACE_SCOPE_EXIT:  web_socket.cpp:165 CSpxWebSocket::CSpxWebSocket
[600859]: 8381ms SPX_DBG_TRACE_VERBOSE:  resource_manager.cpp:95 Created 'CSpxWebSocket' as '482598724'
[600859]: 8381ms SPX_DBG_TRACE_VERBOSE:  named_properties.h:364 ISpxPropertyBagImpl::SetStringValue: this=0x00000000014A7C08; name='SPEECH-ConnectionUrl'; value='wss://XXX.cognitiveservices.azure.com/cognitiveservices/websocket/v1'
[933744]: 8381ms SPX_TRACE_INFO:  web_socket.cpp:765 CSpxWebSocket::DoWork: open transport.
[933744]: 8381ms SPX_TRACE_INFO:  web_socket.cpp:508 Start to open websocket. WebSocket: 0x1474f90, wsio handle: 0x143fbe0
[600859]: 8381ms SPX_DBG_TRACE_VERBOSE:  usp_tts_engine_adapter.cpp:340 speech.config {"context":{"system":{"version":"1.20.0","name":"SpeechSDK","build":"Windows-x64"},"os":{"platform":"Windows","name":"Client","version":"10"}}}
[600859]: 8381ms SPX_DBG_TRACE_VERBOSE:  usp_tts_engine_adapter.cpp:387 speech.config='{"context":{"system":{"version":"1.20.0","name":"SpeechSDK","build":"Windows-x64"},"os":{"platform":"Windows","name":"Client","version":"10"}}}'
[600859]: 8381ms SPX_DBG_TRACE_VERBOSE:  usp_tts_engine_adapter.cpp:387 synthesis.context='{"synthesis":{"audio":{"outputFormat":"raw-8khz-8bit-mono-alaw","metadataOptions":{"visemeEnabled":false,"bookmarkEnabled":false,"wordBoundaryEnabled":false,"sentenceBoundaryEnabled":false}},"language":{"autoDetection":false}}}'
[600859]: 8381ms SPX_DBG_TRACE_VERBOSE:  usp_tts_engine_adapter.cpp:372 ssml <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xmlns:emo='http://www.w3.org/2009/10/emotionml' xml:lang='he-IL'><voice name='he-IL-HilaNeural'>שיבוטה</voice></speak>
[600859]: 8381ms SPX_DBG_TRACE_VERBOSE:  usp_tts_engine_adapter.cpp:387 ssml='<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xmlns:emo='http://www.w3.org/2009/10/emotionml' xml:lang='he-IL'><voice name='he-IL-HilaNeural'>שיבוטה</voice></speak>'
[933744]: 8457ms SPX_TRACE_INFO:  usp_connection.cpp:756 Create requestId  for messageType 0
[933744]: 8458ms SPX_DBG_TRACE_SCOPE_ENTER:  web_socket.cpp:170 CSpxWebSocket::~CSpxWebSocket
[933744]: 8458ms SPX_DBG_TRACE_SCOPE_EXIT:  web_socket.cpp:170 CSpxWebSocket::~CSpxWebSocket
[933744]: 8721ms SPX_TRACE_ERROR: AZ_LOG_ERROR:  uws_client.c:1239 Bad status (404) received in WebSocket Upgrade response
[933744]: 8721ms SPX_TRACE_ERROR:  trace_message.cpp:207 Error: File:D:\a\_work\1\s\external\azure-c-shared-utility\src\uws_client.c Func:on_underlying_io_bytes_received Line:1239 
[933744]: 8721ms SPX_TRACE_ERROR:  web_socket.cpp:868 WS open operation failed with result=14(WS_OPEN_ERROR_BAD_RESPONSE_STATUS), code=404[0x00000194]
[933744]: 8721ms SPX_TRACE_INFO:  usp_connection.cpp:902 TS:340, TransportError: connection:0x145ed30, code=7, string=WebSocket upgrade failed: Internal service error (404). Please check request details.
[933744]: 8721ms SPX_DBG_TRACE_VERBOSE:  usp_tts_engine_adapter.cpp:767 Response: On Error: Code:7, Message: WebSocket upgrade failed: Internal service error (404). Please check request details..
[933744]: 8721ms SPX_DBG_TRACE_VERBOSE:  create_object_helpers.h:78 SpxTerm: ptr=0x00000000014EDE48
[933744]: 8721ms SPX_DBG_TRACE_SCOPE_ENTER:  usp_connection.cpp:139 Microsoft::CognitiveServices::Speech::USP::CSpxUspConnection::~CSpxUspConnection
[933744]: 8721ms SPX_DBG_TRACE_SCOPE_EXIT:  usp_connection.cpp:139 Microsoft::CognitiveServices::Speech::USP::CSpxUspConnection::~CSpxUspConnection
[600859]: 8721ms SPX_DBG_TRACE_FUNCTION:  synthesis_result.cpp:25 CSpxSynthesisResult::CSpxSynthesisResult
[600859]: 8721ms SPX_DBG_TRACE_VERBOSE:  resource_manager.cpp:95 Created 'CSpxSynthesisResult' as '3874248'
[600859]: 8721ms SPX_DBG_TRACE_VERBOSE:  named_properties.h:364 ISpxPropertyBagImpl::SetStringValue: this=0x00000000014FC010; name='CancellationDetails_ReasonDetailedText'; value='WebSocket upgrade failed: Internal service error (404). Please check request details. USP state: 2. Received audio size: 0 bytes.'
[600859]: 8721ms SPX_DBG_TRACE_VERBOSE:  named_properties.h:364 ISpxPropertyBagImpl::SetStringValue: this=0x00000000014FC010; name='CancellationDetails_ReasonDetailedText'; value='WebSocket upgrade failed: Internal service error (404). Please check request details. USP state: 2. Received audio size: 0 bytes.'
[600859]: 8721ms SPX_DBG_TRACE_FUNCTION:  synthesis_result.cpp:30 CSpxSynthesisResult::~CSpxSynthesisResult
[600859]: 8721ms SPX_TRACE_ERROR:  usp_tts_engine_adapter.cpp:116 Synthesis cancelled without data received, retrying.

Same thing happens when I explicitly set the following endpoint - wss://XXX.cognitiveservices.azure.com/tts/websocket/v1 - which I took from following this guide, original US West endpoint I took as a standard sample was wss://westus.tts.speech.microsoft.com/cognitiveservices/websocket/v1

Can you kindly direct me on how should I troubleshoot the matter? Are there any configurations I may need to look at in my Azure Speech Service to make this work?

Edward

Setup (please complete the following information if applicable):

ghost commented 2 years ago

Thank you for your feedback. This has been routed to the support team for assistance.

joshfree commented 2 years ago

/cc @samvaity

yulin-li commented 2 years ago

As the doc says, the url is {your custom name}.cognitiveservices.azure.com/{speech service offering}/{URL path}.

So you need to use fromEndpoint method, and the endpoint is wss://XXX.cognitiveservices.azure.com/tts/cognitiveservices/websocket/v1

ObakeFilter commented 2 years ago

Probably missed the part that says that 'cognitiveservices' is part of the url, thought it should be moved as part of the hostname and removed from the path.

Thanks.