envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.17k stars 4.83k forks source link

SSL Connection Error/SSL Handshake #24775

Closed egkelly closed 1 year ago

egkelly commented 1 year ago

Title: SSL Connection Error/SSL Handshake

Description: I am using AWS App Mesh with Envoy and have configured mTLS with a SPIFEE/SPIRE setup. Everything appears to be working as expected, I have TLS STRICT mode enabled and am able to navigate to my application without issue. However, in looking at the sidecar envoy logs and admin server stats, I see a number of SSL/TLS errors:

listener.0.0.0.0_15000.ssl.connection_error: 434478
listener.0.0.0.0_15000.ssl.curves.X25519: 13
listener.0.0.0.0_15000.ssl.fail_verify_cert_hash: 0
listener.0.0.0.0_15000.ssl.fail_verify_error: 7
listener.0.0.0.0_15000.ssl.fail_verify_no_cert: 0
listener.0.0.0.0_15000.ssl.fail_verify_san: 0
listener.0.0.0.0_15000.ssl.handshake: 13
listener.0.0.0.0_15000.ssl.no_certificate: 0
listener.0.0.0.0_15000.ssl.ocsp_staple_failed: 0
listener.0.0.0.0_15000.ssl.ocsp_staple_omitted: 0
listener.0.0.0.0_15000.ssl.ocsp_staple_requests: 0
listener.0.0.0.0_15000.ssl.ocsp_staple_responses: 0
listener.0.0.0.0_15000.ssl.session_reused: 6
listener.0.0.0.0_15000.ssl.sigalgs.ecdsa_secp256r1_sha256: 13
listener.0.0.0.0_15000.ssl.versions.TLSv1.2: 13
[2023-01-05 16:18:43.669][30][trace][connection] [source/common/network/connection_impl.cc:563] [C832197] socket event: 3
[2023-01-05 16:18:43.669][30][trace][connection] [source/common/network/connection_impl.cc:674] [C832197] write ready
[2023-01-05 16:18:43.669][30][trace][connection] [source/common/network/connection_impl.cc:603] [C832197] read ready. dispatch_buffered_data=0
[2023-01-05 16:18:43.669][30][trace][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:88] [C832197] ssl read returns: 104
[2023-01-05 16:18:43.669][30][trace][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:88] [C832197] ssl read returns: -1
[2023-01-05 16:18:43.669][30][trace][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:128] [C832197] ssl error occurred while read: WANT_READ
[2023-01-05 16:18:43.669][30][trace][connection] [source/extensions/transport_sockets/tls/ssl_socket.cc:164] [C832197] ssl read 104 bytes
[2023-01-05 16:18:43.669][30][trace][http] [source/common/http/http1/codec_impl.cc:579] [C832197] parsing 104 bytes
[2023-01-05 16:18:43.669][30][trace][http] [source/common/http/http1/codec_impl.cc:865] [C832197] message begin
[2023-01-05 16:18:43.669][30][debug][http] [source/common/http/conn_manager_impl.cc:299] [C832197] new stream
[2023-01-05 16:18:43.669][30][trace][misc] [source/common/event/scaled_range_timer_manager_impl.cc:60] enableTimer called on 0x17887f59d5e0 for 300000ms, min is 300000ms
[2023-01-05 16:18:43.669][30][trace][http] [source/common/http/http1/codec_impl.cc:492] [C832197] completed header: key=host value=myapp.production1.svc.cluster.local:3000
[2023-01-05 16:18:43.669][30][trace][http] [source/common/http/http1/codec_impl.cc:716] [C832197] onHeadersCompleteBase
[2023-01-05 16:18:43.669][30][trace][http] [source/common/http/http1/codec_impl.cc:492] [C832197] completed header: key=user-agent value=Envoy/HC
[2023-01-05 16:18:43.669][30][trace][http] [source/common/http/http1/codec_impl.cc:1053] [C832197] Server: onHeadersComplete size=2
[2023-01-05 16:18:43.669][30][trace][http] [source/common/http/http1/codec_impl.cc:843] [C832197] message complete
[2023-01-05 16:18:43.669][30][debug][http] [source/common/http/conn_manager_impl.cc:904] [C832197][S5285646162669019163] request headers complete (end_stream=true):
':authority', 'myapp.production1.svc.cluster.local:3000'
':path', '/healthcheck'
':method', 'GET'
'user-agent', 'Envoy/HC'

As you can see, I have 13 handshakes and 400k ssl connection errors, as well as an example log where envoy encounters an ssl error. Is this expected behavior? As I said, everything appears to be working correctly, I'm just concerned by the number of connection errors being thrown.

htuch commented 1 year ago

It's hard to say if this is an AWS App Mesh or Envoy issue based on what is presented so far, i.e. the fail_verify_error stats. The "ssl error" lines on reads are normal at trace level; maybe this is a bit confusing but it's just reflecting the return code and that data is not available. @ggreenway maybe we should change these on the happy path?

ggreenway commented 1 year ago

It's unclear what's going on with the information provided. When the connection_error stat is incremented, it should log the reason here. I recommend trying to find those log messages and see if they shed any light on what the failures are, or post them in this issue.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.