Closed cboitel closed 2 years ago
after (v1.21.1), client hello may take more than 50ms to be sent and total time to establish TLS tunnel is 75ms
As you've pointed out, it seems this is the change that's resulting in your timeouts.
Does this delay happen for every connection, or only sometimes.
Can you enable trace-level logging and post the logs for one connection?
Envoy error log at trace level and tcpdump extract in text mode:
In these files:
I have isolated inside the tcpdump packets related to that first connection:
These do match C4/C5 in error log
It's possible this is the same as #19717. The fix for that was just merged. Can you test the latest from main
to see if that fixes it?
I succesfully recompiled envoy: took some time and ressources...but was worth it.
Using the compiled binary, i can see our tests are now passing with connection_timeout set back to 0.1s. Looking$ at the tcpdump/... extracts, i can see now client hello triggering immediately as it was previously.
Will it be backported to v1.21.2 or will wait for upcoming v1.22.0 or both ? I believe this is a major impact and typically prevents us from upgrading at this stage.
Yeah we should backport this. Will mark it as such.
Going to close this and we can track the backport status in the merged PR. Thank you!
Title: upstream connection failure since upgrade to v1.21.1
Description:
Was using v1.20.0 to connect to a series of upstream services using TLS as part of our non-regression tests. Upgrading to v1.21.1 lead to failing tests due to intermittent errors (not the same service...) all of which had in common
We increased our default connection timeout to local services (inside a docker-compose) from 0.1s to .25s and failures no longer are reported.
Repro steps: We use:
Admin and Stats Output: N/A
Config: N/A Logs:
I have a tcpdump of TLS nego for both versions of envoy and it is clearly showing extra latency is added at TLS negotiation time:
access_log:
error_log:
Call Stack: N/A