Closed Elias481 closed 4 years ago
Hi,
Prepare some way to let an TLS handshake time out (but the TCP connection itself must establish before)
@Elias481 do you know a way to provoke a TLS handshake timeout? What causes the timeout in your environment?
Kind regards Michael
The simplest case to simulate such situation, for example, is:
nc -w 3600 -v -l 5665
Observer the logs during that. You can see befor the stub server is restarted there are quite many reconnect tries with connection refused. Afterwards it does npothing after the connection attempt started. Btw. if the connection is quite cleanly shut down, the log states information/ApiListener: Finished reconnecting to endpoint 'xxxnn.subdom.priv.nmop.de' via host '10.101.15.209' and port '5665'
without any information that the connection closed. While in production we giot a connection reset beause the infrastructure somehow terminated the connection.
The reason is unclear. It happened directly after reboot for example or after restarting agent. One time I was called to look on it myself and I ensured the agent was completely stoppe (no stale process and no TCP connection on that port visible at agent side). Anyway such things can happen if there is a realworld network between the servers.
# sudo docker run -ti -h deb10i2m1 -p 5665:5665 debian:buster /bin/bash
# apt-get update && apt-get upgrade -y && apt-get install wget gnupg2 ca-certificates vim apt-transport-https -y && echo "deb https://packages.icinga.com/debian/ icinga-buster-snapshots main" > /etc/apt/sources.list.d/icinga.list && wget -O - https://packages.icinga.com/icinga.key | apt-key add - && apt-get update && apt-get install icinga2 monitoring-plugins -y && /usr/lib/icinga2/prepare-dirs
# icinga2 node wizard
Welcome to the Icinga 2 Setup Wizard!
We will guide you through all required configuration details.
Please specify if this is an agent/satellite setup ('n' installs a master setup) [Y/n]: n
Starting the Master setup routine...
Please specify the common name (CN) [deb10i2m1]:
Reconfiguring Icinga...
Checking for existing certificates for common name 'deb10i2m1'...
Certificates not yet generated. Running 'api setup' now.
Generating master configuration for Icinga 2.
Enabling feature api. Make sure to restart Icinga 2 for these changes to take effect.
Master zone name [master]:
Default global zones: global-templates director-global
Do you want to specify additional global zones? [y/N]: Please specify the API bind host/port (optional):
Bind Host []: Bind Port []:
Do you want to disable the inclusion of the conf.d directory [Y/n]:
Disabling the inclusion of the conf.d directory...
Checking if the api-users.conf file exists...
Done.
Now restart your Icinga 2 daemon to finish the installation!
# vim /etc/icinga2/zones.conf
object Endpoint "deb10i2m1" {
}
object Zone "master" {
endpoints = [ "deb10i2m1" ]
}
object Zone "global-templates" {
global = true
}
object Zone "director-global" {
global = true
}
object Endpoint "deb10i2a1" {
host = "172.17.0.3"
}
object Zone "deb10i2a1" {
parent = "master"
endpoints = [ "deb10i2a1" ]
}
# sudo docker run -ti -h deb10i2a1 debian:buster /bin/bash
# apt-get update && apt-get upgrade -y && apt-get install wget gnupg2 ca-certificates vim apt-transport-https -y && echo "deb https://packages.icinga.com/debian/ icinga-buster-snapshots main" > /etc/apt/sources.list.d/icinga.list && wget -O - https://packages.icinga.com/icinga.key | apt-key add - && apt-get update && apt-get install icinga2 monitoring-plugins netcat -y && /usr/lib/icinga2/prepare-dirs
# icinga2 node wizard
Welcome to the Icinga 2 Setup Wizard!
We will guide you through all required configuration details.
Please specify if this is an agent/satellite setup ('n' installs a master setup) [Y/n]:
Starting the Agent/Satellite setup routine...
Please specify the common name (CN) [deb10i2a1]:
Please specify the parent endpoint(s) (master or satellite) where this node should connect to:
Master/Satellite Common Name (CN from your master/satellite node): deb10i2m1
Do you want to establish a connection to the parent node from this node? [Y/n]:
Please specify the master/satellite connection information:
Master/Satellite endpoint host (IP address or FQDN): 172.17.0.2
Master/Satellite endpoint port [5665]:
Add more master/satellite endpoints? [y/N]:
Parent certificate information:
Subject: CN = deb10i2m1
Issuer: CN = Icinga CA
Valid From: Feb 4 19:12:57 2020 GMT
Valid Until: Jan 31 19:12:57 2035 GMT
Fingerprint: 0F 31 F7 30 C3 9E E3 73 56 9E D0 CC 41 16 A6 14 77 05 37 55
Is this information correct? [y/N]: y
Please specify the request ticket generated on your Icinga 2 master (optional).
(Hint: # icinga2 pki ticket --cn 'deb10i2a1'):
No ticket was specified. Please approve the certificate signing request manually
on the master (see 'icinga2 ca list' and 'icinga2 ca sign --help' for details).
Please specify the API bind host/port (optional):
Bind Host []:
Bind Port []:
Accept config from parent node? [y/N]:
Accept commands from parent node? [y/N]:
Reconfiguring Icinga...
Disabling feature notification. Make sure to restart Icinga 2 for these changes to take effect.
Enabling feature api. Make sure to restart Icinga 2 for these changes to take effect.
Local zone name [deb10i2a1]:
Parent zone name [master]:
Default global zones: global-templates director-global
Do you want to specify additional global zones? [y/N]:
Do you want to disable the inclusion of the conf.d directory [Y/n]:
Disabling the inclusion of the conf.d directory...
Done.
Now restart your Icinga 2 daemon to finish the installation!
Now sign the certificate on the master.
# icinga2 ca list
Fingerprint | Timestamp | Signed | Subject
-----------------------------------------------------------------|--------------------------|--------|--------
4b536c1309b2161f18b6b2993a273b3ea9c84a097bb131b0df2dca4170487701 | Feb 4 19:14:41 2020 GMT | | CN = deb10i2a1
# icinga2 ca sign 4b536c1309b2161f18b6b2993a273b3ea9c84a097bb131b0df2dca4170487701
Restart Icinga 2 on master and agent.
Ensure that master and agent communicate. Stop Icinga 2 on the agent and run nc
.
nc -w 3600 -v -l -p 5665
The master tries to reconnect but the connection should run into a timeout since the TLS handshake never succeeds. In the log on the master you'll see the following line.
[2020-02-04 19:33:58 +0000] information/ApiListener: Reconnecting to endpoint 'deb10i2a1' via host '172.17.0.3' and port '5665'
The default timeout of 10 seconds is never hit.
As @Elias481 already mentioned is seems like the TLS timeout check was not implemented in the network stack rewrite.
Yep, that's a known bug, thanks for reporting. AFAIK Boost ASIO doesn't provide such a timeout interface that's why it wasn't implemented during the rewrite. If you can spot a patch, much appreciated.
Edit: The stream timeout exists, but requires all subsequent operations to complete in that timeout window afterwards. For read/write operations, this will cause problems in our stack. https://www.boost.org/doc/libs/1_70_0/libs/beast/doc/html/beast/using_io/timeouts.html Likely we will need our own timeout handling with timers, as done here.
From what I undertood the documentation it is possible to prolong the timer (set a new timeout) for the next operations or disable it for the following ones (stream.expires_never();
).
But anyway I'm fine with any solution. Thanks.
is there a recomendation on boost version to use on 2.11.3 ?
Describe the bug
It looks like (at least) the TLS Handshake Timeout got lost with new 2.11 networking stack. We sometimes have restarted machines that do not ome online again after a restart of the agent for example. From logging I see that a "starting reconnect" is logged and about 2 hours later a connection reset is received during TLS handshake. Immediately after that error the connection is set up successfully.
To Reproduce
Expected behavior
A meaningful timeout (the old default of 10 seconds is fine, for example)
Your Environment
Include as many relevant details about the environment you experienced the problem in
icinga2 --version
): 2.11.2-1Additional context
I can find TlsHandshakeTimeout related code in old tlsstram.cpp, but nothin in current version (there should be something like
get_lowest_layer(stream).expires_after(std::chrono::seconds(10));
inNewClientHandlerInternal
beforesslConn.async_handshake
I assume ..