Closed Niaxor closed 2 months ago
I believe the issue lies in the bootstrap script for the Lambda container for the .NET 8 runtime - it has the following code:
# .NET on Linux uses OpenSSL to handle certificates. The .NET runtime will load the certs by first reading
# the default cert bundle file which can be overriden by the SSL_CERT_FILE env var. Then it will load the
# certs in the default cert directory which can be overriden by the SSL_CERT_DIR env var. On AL2023
# The default cert bundle file, via symbolic links, resolves to being in a file under the default cert directory.
# This means the default cert bundle file is double loaded causing a cold start performance hit. This logic
# sets the SSL_CERT_FILE to an empty file if SSL_CERT_FILE hasn't been explicitly
# set. This avoid the double load of the default cert bundle file.
if [ -z "${SSL_CERT_FILE}" ]; then
export SSL_CERT_FILE="/var/runtime/empty-certificates.crt"
fi
Without diving into the openssl source, if I was to guess, there is a bug in this code that assumes that if SSL_CERT_FILE
is set to an empty certificates file, the open ssl libraries will still check the default certificate directories.
However, i suspect actually what happens is that in the case SSL_CERT_FILE
is set and SSL_CERT_DIR
is unset, it will only load SSL_CERT_FILE
and not load the files in the hard-coded default directories.
This is only a guess.
Setting SSL_CERT_FILE
to a valid value like /etc/pki/tls/certs/ca-bundle.crt
from the lambda function config fixes this issue.
Alternatively setting SSL_CERT_DIR
to a valid value like /etc/ssl/certs
will also fix this issue.
As it seems this is not a bug with Azure cognitive services, I will close this issue with this comment
When using the SDK using the .NET 8 runtime in AWS Lambda, the following error occurs.
AZ_LOG_INFO: tlsio_openssl.c:1882 CRL check enabled. [144128]: 300ms SPX_TRACE_SCOPE_EXIT: uws_web_socket.cpp:149 Open [144128]: 300ms SPX_TRACE_INFO: usp_connection.cpp:787 Create requestId for messageType 0 [144128]: 311ms SPX_TRACE_ERROR: AZ_LOG_ERROR: tlsio_openssl.c:691 error:0A000086:SSL routines::certificate verify failed [144128]: 311ms SPX_TRACE_ERROR: AZ_LOG_ERROR: tlsio_openssl.c:2464 FORCE-Closing tlsio instance. [144128]: 311ms SPX_TRACE_SCOPE_ENTER: uws_web_socket.cpp:247 OnWebSocketOpened [144128]: 312ms SPX_TRACE_ERROR: web_socket.cpp:902 WS open operation failed with result=1(WS_OPEN_ERROR_UNDERLYING_IO_OPEN_FAILED), code=2573[0x00000a0d], time=2024-07-31T15:19:58.9696914Z
No issues in the Mock Lambda test tool, or using it locally in a console app. I suspect because the certificate validation issue is related to the context of the lambda image.
Would you be able to provide any details about the authentication flow under the hood and why the SSL might fail in this context?
I tried manually opening a web socket connection (via .NET) to the underlying wss:// uri which had no certificate issues on Lambda.
I tried dumping the cert chain, uploading it with the function and forcing open SSL to use it on Lambda, then initialising the SDK, to be greeted with the same error.