Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.85k stars 1.83k forks source link

certificate verify failed AWS Lambda #2529

Closed Niaxor closed 2 months ago

Niaxor commented 2 months ago

When using the SDK using the .NET 8 runtime in AWS Lambda, the following error occurs.

AZ_LOG_INFO: tlsio_openssl.c:1882 CRL check enabled. [144128]: 300ms SPX_TRACE_SCOPE_EXIT: uws_web_socket.cpp:149 Open [144128]: 300ms SPX_TRACE_INFO: usp_connection.cpp:787 Create requestId for messageType 0 [144128]: 311ms SPX_TRACE_ERROR: AZ_LOG_ERROR: tlsio_openssl.c:691 error:0A000086:SSL routines::certificate verify failed [144128]: 311ms SPX_TRACE_ERROR: AZ_LOG_ERROR: tlsio_openssl.c:2464 FORCE-Closing tlsio instance. [144128]: 311ms SPX_TRACE_SCOPE_ENTER: uws_web_socket.cpp:247 OnWebSocketOpened [144128]: 312ms SPX_TRACE_ERROR: web_socket.cpp:902 WS open operation failed with result=1(WS_OPEN_ERROR_UNDERLYING_IO_OPEN_FAILED), code=2573[0x00000a0d], time=2024-07-31T15:19:58.9696914Z

No issues in the Mock Lambda test tool, or using it locally in a console app. I suspect because the certificate validation issue is related to the context of the lambda image.

Would you be able to provide any details about the authentication flow under the hood and why the SSL might fail in this context?

I tried manually opening a web socket connection (via .NET) to the underlying wss:// uri which had no certificate issues on Lambda.

I tried dumping the cert chain, uploading it with the function and forcing open SSL to use it on Lambda, then initialising the SDK, to be greeted with the same error.

Niaxor commented 2 months ago

I believe the issue lies in the bootstrap script for the Lambda container for the .NET 8 runtime - it has the following code:

# .NET on Linux uses OpenSSL to handle certificates. The .NET runtime will load the certs by first reading
# the default cert bundle file which can be overriden by the SSL_CERT_FILE env var. Then it will load the
# certs in the default cert directory which can be overriden by the SSL_CERT_DIR env var. On AL2023
# The default cert bundle file, via symbolic links, resolves to being in a file under the default cert directory.
# This means the default cert bundle file is double loaded causing a cold start performance hit. This logic
# sets the SSL_CERT_FILE to an empty file if SSL_CERT_FILE hasn't been explicitly
# set. This avoid the double load of the default cert bundle file.
if [ -z "${SSL_CERT_FILE}" ]; then
  export SSL_CERT_FILE="/var/runtime/empty-certificates.crt"
fi

Without diving into the openssl source, if I was to guess, there is a bug in this code that assumes that if SSL_CERT_FILE is set to an empty certificates file, the open ssl libraries will still check the default certificate directories. However, i suspect actually what happens is that in the case SSL_CERT_FILE is set and SSL_CERT_DIR is unset, it will only load SSL_CERT_FILE and not load the files in the hard-coded default directories. This is only a guess.

Setting SSL_CERT_FILE to a valid value like /etc/pki/tls/certs/ca-bundle.crt from the lambda function config fixes this issue. Alternatively setting SSL_CERT_DIR to a valid value like /etc/ssl/certs will also fix this issue.

As it seems this is not a bug with Azure cognitive services, I will close this issue with this comment