Closed cubic3d closed 5 years ago
We are experiencing the same problem with the Icinga Output Plugin for Logstash. After some working API calls, the Log shows "[2018-11-26 11:39:09 +0100] information/HttpServerConnection: Unable to disconnect Http client, I/O thread busy". After this, all Client TLS handshakes fail: "Error: Timeout was reached (10) during TLS handshake".
While debugging hanging connection issues on our icinga2 setup, my finding is, that with "Connection: close" it makes a difference if the client sends the headers and body in one go or not.
If the client sends them in one socket operation, everything is fine, because they will be put into the same TCP packet and parsed by icinga at the same time. If it sends the headers and body in two separate operations, it will probably result in two TCP packets, which depending on latency and icinga reaction time, will be parsed in two steps by icinga. Icinga seems to read and parse the header packet with the "Connection: close" entry, tries to close the connection, while there is still data, the body packet, pending. This results in a hanging connection and also threads being stuck.
With a fresh started icinga and one hanging connection I did a quick gdb backtrace and I get two suspicious threads with these backtraces:
Thread 23 (Thread 0x7f3f8b9d1700 (LWP 179709)):
#0 0x00007f3fd0fd326d in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f3fd0fcce42 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00000000006ba69f in icinga::HttpServerConnection::DataAvailableHandler() ()
#3 0x00000000006488ee in ?? ()
#4 0x00000000005d4178 in icinga::Stream::SignalDataAvailable() ()
#5 0x000000000062c1c8 in icinga::TlsStream::OnEvent(int) ()
#6 0x00000000005f82cb in icinga::SocketEventEngineEpoll::ThreadProc(int) ()
#7 0x00007f3fd0209c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007f3fd0fca6ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007f3fd1e9b41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Thread 14 (Thread 0x7f3fcc501700 (LWP 179700)):
#0 0x00007f3fd0fd0709 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00000000005d89ea in icinga::SocketEventEngine::WakeUpThread(int, bool) ()
#2 0x00000000005d9325 in icinga::SocketEventEngineEpoll::Unregister(icinga::SocketEvents*) ()
#3 0x000000000061cf09 in icinga::TlsStream::CloseInternal(bool) ()
#4 0x00000000006a679a in icinga::HttpServerConnection::Disconnect() ()
#5 0x00000000006ba661 in icinga::HttpServerConnection::DataAvailableHandler() ()
#6 0x00000000006ba9c1 in icinga::HttpServerConnection::Start() ()
#7 0x0000000000679162 in icinga::ApiListener::NewClientHandlerInternal(boost::intrusive_ptr<icinga::Socket> const&, icinga::String const&, icinga::ConnectionRole) ()
#8 0x000000000067a430 in icinga::ApiListener::NewClientHandler(boost::intrusive_ptr<icinga::Socket> const&, icinga::String const&, icinga::ConnectionRole) ()
#9 0x000000000062edb6 in icinga::ThreadPool::WorkerThread::ThreadProc(icinga::ThreadPool::Queue&) ()
#10 0x00007f3fd197b5d5 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0
#11 0x00007f3fd0fca6ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#12 0x00007f3fd1e9b41d in clone () from /lib/x86_64-linux-gnu/libc.so.6
The first one is blocking to acquire m_DataHandlerMutex in icinga::HttpServerConnection::DataAvailableHandler(), which the second one already acquired in icinga::HttpServerConnection::Disconnect() and is trying to close the connection, looping to wake up a (the other?) thread.
I used the 2.10.2-1.xenial package from packages.icinga.com. Without debug info I cannot be sure whether the threads are in fact blocking on the same m_DataHandlerMutex, but I'm pretty certain they do.
Reverting 13239c3172b625f615b663cd809530b1441149dc (which moved the disconnect path to icinga::HttpServerConnection::DataAvailableHandler) fixes our hanging connecion issue.
In our case the system is stable if there are no timouts (especially TLS handshake) here. But with the begin of the first message in the log files, our cluster is beginning to get unstable. The cluster-, ido- and icinga-checks are getting critical or unknown. But the icinga daemon is still running.
This is an example of the debug-log from our master2:
Deployment with the director:
On our master1 we configured the director for automatic deployment. If there is one in such unstable icinga state, the daemon is not able to reload correctly. It seems that it is getting forked, but it can't stop the old childs.
This is an example output while icinga2 is reloading. For us it seems icinga crashes while the step "information/ExternalCommandListener: 'command' stopped.", because it is the last output and systemd write still "reload" as state:
[2018-11-20 20:43:01 +0100] information/WorkQueue: #120 (HttpServerConnection) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2018-11-20 20:43:15 +0100] information/Checkable: Checking for configured notifications for object 'zmon-master1.ooe.gv.at!cluster'
[2018-11-20 20:43:28 +0100] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 1, rate: 32.8667/s (1972/min 10025/5min 29932/15min);
[2018-11-20 20:43:38 +0100] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 1, rate: 33.0167/s (1981/min 10025/5min 29932/15min);
[2018-11-20 20:43:56 +0100] information/WorkQueue: #153 (HttpServerConnection) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2018-11-20 20:44:06 +0100] information/Application: Got reload command: Starting new instance.
[2018-11-20 20:44:17 +0100] information/Application: Reload requested, letting new process take over.
[2018-11-20 20:44:17 +0100] information/ApiListener: 'api' stopped.
[2018-11-20 20:44:17 +0100] information/CheckerComponent: 'checker' stopped.
[2018-11-20 20:44:17 +0100] information/ExternalCommandListener: 'command' stopped.
Sometimes it happens, if the master1 is hanging, some seconds later also the icinga daemon at the master2 get the same problem .
State of the icinga2 daemon
Very often we find lines like this in the systemd-log after a restart/reload:
Problems with acknowledgements and setting downtimes in icingaweb2
The acknowledgment via icingaweb2 is not possible. There is a timout error: Icinga Log: This is the message in icingaweb2: Result: The acknowledgment is not set.
The same situation is with setting a downtime
Icinga Log:
Icingaweb2 message:
But the diffrence to the acknowledgement the downtime is set.
Logstash-Plugin "logstash-output-icinga"
We also use the Logstash Plugin "logstash-output-icinga ". If logstash send something via API to icinga, the connection is reset by icinga. This is the icinga log entry from the request: .
This are possible return messages in the logstash log:
I tested this also with on old icinga Setup (2.8.4-1). There icinga can handle the api request.
Because of this problem I changed to the logstash-plugin "logstash-output-nagios" for using the command pipe. With this everything is fine and our cluster is still stable. Also we don't have TLS-Errors in the log-files.
At the end it is the same behaviour what we have if nessus is scanning the server - see
https://github.com/Icinga/icinga2/issues/6559. That properly means for us the icinga logstash output plugin and nessus are sending API request which Icinga with Version 2.10.x can't handle.
Thus far, I'm not able to reproduce it with my previous analysis in https://github.com/Icinga/icinga2/issues/6514#issuecomment-428155731
If one could provide a simple reproducer script, or a given number of objects/IO required for this tests, much appreciated. I'm still wondering whether the requests need to run in full parallel, or going more into the socket events overloaded route.
What I came up with is this snipet that uses openssl s_client and sends the header and body with a sleep in between. If you comment the sleep, it should work, with the sleep it results in icinga threads blocking. There's no parallelism needed, just one request with the sleep in between.
{
cat <<-EOF
GET /v1/objects/hosts HTTP/1.1
Authorization: Basic $BASE64_ENCODED_USERNAME_PASSWORD
Content-Type: application/json
Connection: close
Accept: application/json
Content-Length: 80
EOF
sleep 1
cat <<-EOF
{"filter": "host.state!=0 && host.acknowledgement!=1 && host.downtime_depth==0"}
EOF
} | openssl s_client -quiet -connect $IP:5665
There's no parallelism needed, just one request with the sleep in between.
Thanks, that was the missing point in my brain.
Script with root:icinga
base64 encoded.
vim conn_close.sh
#!/bin/bash
BASE64_ENCODED_USERNAME_PASSWORD="cm9vdDppY2luZ2E="
IP="127.0.0.1"
{
cat <<-EOF
GET /v1/objects/hosts HTTP/1.1
Authorization: Basic $BASE64_ENCODED_USERNAME_PASSWORD
Content-Type: application/json
Connection: close
Accept: application/json
Content-Length: 80
EOF
sleep 1
cat <<-EOF
{"filter": "host.state!=0 && host.acknowledgement!=1 && host.downtime_depth==0"}
EOF
} | openssl s_client -quiet -connect $IP:5665
chmod +x conn_close.sh
michi@mbmif ~/coding/icinga/icinga2 (feature/development-docs=) $ ./conn_close.sh
depth=1 CN = Icinga CA
verify error:num=19:self signed certificate in certificate chain
verify return:0
[2018-12-03 13:39:38 +0100] information/ApiListener: New client connection from [::ffff:127.0.0.1]:53215 (no client certificate)
[2018-12-03 13:39:38 +0100] information/HttpServerConnection: Request: GET /v1/objects/hosts (from [::ffff:127.0.0.1]:53215), user: root)
[2018-12-03 13:39:38 +0100] information/HttpServerConnection: Unable to disconnect Http client, I/O thread busy
[2018-12-03 13:39:38 +0100] information/HttpServerConnection: HTTP client disconnected (from [::ffff:127.0.0.1]:53215)
[2018-12-03 13:39:38 +0100] information/HttpServerConnection: HTTP client disconnected (from [::ffff:127.0.0.1]:53215)
The doubled log line is misleading at best, so I've taken the extra mile and debugged this below.
While reverting the suggested patch, there's a certain possibility that the "corked" functionality did hide this dead lock previously.
<<<<<<< HEAD
/* Request finished, decide whether to explicitly close the connection. */
if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
m_CurrentRequest.Headers->Get("connection") == "close") {
m_Stream->Shutdown();
close = true;
}
=======
m_RequestQueue.Enqueue(std::bind(&Stream::SetCorked, m_Stream, false));
>>>>>>> parent of 13239c317... Ensure that HTTP/1.0 or Connection: close headers are properly disconnecting the client
michi@mbmif ~/coding/icinga/icinga2 (bugfix/conn-close-stall) $ ./conn_close.sh
depth=1 CN = Icinga CA
verify error:num=19:self signed certificate in certificate chain
verify return:0
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
Server: Icinga/v2.10.2-91-g9afe7b234
206b4
{"results":[{"attrs"
...
ta":{},"name":"mbmif","type":"Host"}]}
0
[2018-12-03 13:51:38 +0100] information/ApiListener: New client connection from [::ffff:127.0.0.1]:53414 (no client certificate)
[2018-12-03 13:51:38 +0100] information/HttpServerConnection: Request: GET /v1/objects/hosts (from [::ffff:127.0.0.1]:53414), user: root)
[2018-12-03 13:51:40 +0100] information/HttpServerConnection: HTTP client disconnected (from [::ffff:127.0.0.1]:53414)
mbmif /usr/local/icinga2 (master *) # lldb -- /usr/local/icinga2/lib/icinga2/sbin/icinga2 daemon
(lldb) target create "/usr/local/icinga2/lib/icinga2/sbin/icinga2"
Current executable set to '/usr/local/icinga2/lib/icinga2/sbin/icinga2' (x86_64).
(lldb) settings set -- target.run-args "daemon"
(lldb) b icinga::HttpServerConnection::Disconnect()
Breakpoint 1: where = icinga2`icinga::HttpServerConnection::Disconnect() + 19 at httpserverconnection.cpp:91, address = 0x0000000100629653
(lldb) r
2 threads call Disconnect()
.
[2018-12-03 14:04:52 +0100] information/ApiListener: New client connection from [::ffff:127.0.0.1]:53570 (no client certificate)
[2018-12-03 14:04:52 +0100] information/HttpServerConnection: Request: GET /v1/objects/hosts (from [::ffff:127.0.0.1]:53570), user: root)
Process 31665 stopped
* thread #12, stop reason = breakpoint 1.1
frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
88
89 void HttpServerConnection::Disconnect()
90 {
-> 91 boost::recursive_mutex::scoped_try_lock lock(m_DataHandlerMutex);
92 if (!lock.owns_lock()) {
93 Log(LogInformation, "HttpServerConnection", "Unable to disconnect Http client, I/O thread busy");
94 return;
thread #27, stop reason = breakpoint 1.1
frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
88
89 void HttpServerConnection::Disconnect()
90 {
-> 91 boost::recursive_mutex::scoped_try_lock lock(m_DataHandlerMutex);
92 if (!lock.owns_lock()) {
93 Log(LogInformation, "HttpServerConnection", "Unable to disconnect Http client, I/O thread busy");
94 return;
Target 0: (icinga2) stopped.
This thread handles the main connection and calls HttpServerConnection::DataAvailableHandler()
once in the Start()
method.
(lldb) thr sel 12
* thread #12, stop reason = breakpoint 1.1
frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
88
89 void HttpServerConnection::Disconnect()
90 {
-> 91 boost::recursive_mutex::scoped_try_lock lock(m_DataHandlerMutex);
92 if (!lock.owns_lock()) {
93 Log(LogInformation, "HttpServerConnection", "Unable to disconnect Http client, I/O thread busy");
94 return;
(lldb) bt
* thread #12, stop reason = breakpoint 1.1
* frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
frame #1: 0x000000010062955c icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x000000010382a400) at httpserverconnection.cpp:376
frame #2: 0x00000001006290ff icinga2`icinga::HttpServerConnection::Start(this=0x000000010382a400) at httpserverconnection.cpp:76
frame #3: 0x00000001004f97fb icinga2`icinga::ApiListener::NewClientHandlerInternal(this=0x0000000104804400, client=0x000000010672e360, hostname=0x000000010672e368, role=RoleServer) at apilistener.cpp:606
frame #4: 0x00000001004f60fe icinga2`icinga::ApiListener::NewClientHandler(this=0x0000000104804400, client=0x000000010672e360, hostname=0x000000010672e368, role=RoleServer) at apilistener.cpp:441
.....
frame #16: 0x000000010034f94e icinga2`boost::detail::thread_data<std::__1::__bind<void (icinga::ThreadPool::WorkerThread::*)(icinga::ThreadPool::Queue&), std::__1::reference_wrapper<icinga::ThreadPool::WorkerThread>, std::__1::reference_wrapper<icinga::ThreadPool::Queue> > >::run(this=0x0000000106655100) at thread.hpp:117
frame #17: 0x00000001030882c8 libboost_thread-mt.dylib`boost::(anonymous namespace)::thread_proxy(void*) + 136
frame #18: 0x00007fff53d52661 libsystem_pthread.dylib`_pthread_body + 340
frame #19: 0x00007fff53d5250d libsystem_pthread.dylib`_pthread_start + 377
frame #20: 0x00007fff53d51bf9 libsystem_pthread.dylib`thread_start + 13
(lldb)
This thread is doing the socket event polling and gets woken up in TlsStream::OnEvent which then results in DataAvailableHandler()
and calling Disconnect()
.
Actually this thread is correctly closing the connection.
(lldb) thr sel 27
* thread #27, stop reason = breakpoint 1.1
frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
88
89 void HttpServerConnection::Disconnect()
90 {
-> 91 boost::recursive_mutex::scoped_try_lock lock(m_DataHandlerMutex);
92 if (!lock.owns_lock()) {
93 Log(LogInformation, "HttpServerConnection", "Unable to disconnect Http client, I/O thread busy");
94 return;
(lldb) bt
(lldb) bt
* thread #27, stop reason = breakpoint 1.1
* frame #0: 0x0000000100629653 icinga2`icinga::HttpServerConnection::Disconnect(this=0x000000010382a400) at httpserverconnection.cpp:91
frame #1: 0x000000010062955c icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x000000010382a400) at httpserverconnection.cpp:376
....
frame #21: 0x00000001002e5325 icinga2`icinga::Stream::SignalDataAvailable(this=0x0000000107036cc0) at stream.cpp:55
frame #22: 0x000000010035f9f3 icinga2`icinga::TlsStream::OnEvent(this=0x0000000107036cc0, revents=1) at tlsstream.cpp:289
frame #23: 0x00000001002dd39f icinga2`icinga::SocketEventEnginePoll::ThreadProc(this=0x0000000103890600, tid=0) at socketevents-poll.cpp:121
(lldb) br del
About to delete all breakpoints, do you want to do that?: [Y/n] y
All breakpoints removed. (1 breakpoint)
(lldb) b icinga::HttpServerConnection::DataAvailableHandler
Breakpoint 2: where = icinga2`icinga::HttpServerConnection::DataAvailableHandler() + 33 at httpserverconnection.cpp:351, address = 0x00000001006291b1
[2018-12-03 14:17:44 +0100] information/ApiListener: New client connection from [::ffff:127.0.0.1]:53666 (no client certificate)
Process 32506 stopped
* thread #12, stop reason = breakpoint 2.1
frame #0: 0x00000001006291b1 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:351
348
349 void HttpServerConnection::DataAvailableHandler()
350 {
-> 351 bool close = false;
352
353 if (!m_Stream->IsEof()) {
354 boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
thread #26, stop reason = breakpoint 2.1
frame #0: 0x00000001006291b1 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:351
348
349 void HttpServerConnection::DataAvailableHandler()
350 {
-> 351 bool close = false;
352
353 if (!m_Stream->IsEof()) {
354 boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
Target 0: (icinga2) stopped.
Dig deeper into the first thread being called just once via DataAvailableHandler():
(lldb) thr sel 12
* thread #12, stop reason = breakpoint 2.1
frame #0: 0x00000001006291b1 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:351
348
349 void HttpServerConnection::DataAvailableHandler()
350 {
-> 351 bool close = false;
352
353 if (!m_Stream->IsEof()) {
354 boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
(lldb) n
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x00000001006291b5 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:353
350 {
351 bool close = false;
352
-> 353 if (!m_Stream->IsEof()) {
354 boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
355
356 try {
thread #26, stop reason = breakpoint 2.1
frame #0: 0x00000001006291b1 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:351
348
349 void HttpServerConnection::DataAvailableHandler()
350 {
-> 351 bool close = false;
352
353 if (!m_Stream->IsEof()) {
354 boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x00000001006291dc icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:354
351 bool close = false;
352
353 if (!m_Stream->IsEof()) {
-> 354 boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
355
356 try {
357 while (ProcessMessage())
Target 0: (icinga2) stopped.
(lldb)
[2018-12-03 14:18:10 +0100] information/WorkQueue: #5 (IdoMysqlConnection, ido-mysql) items: 0, rate: 1182.28/s (70937/min 70937/5min 70937/15min);
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x00000001006291f1 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:357
354 boost::recursive_mutex::scoped_lock lock(m_DataHandlerMutex);
355
356 try {
-> 357 while (ProcessMessage())
358 ; /* empty loop body */
359 } catch (const std::exception& ex) {
360 Log(LogWarning, "HttpServerConnection")
Target 0: (icinga2) stopped.
(lldb)
[2018-12-03 14:18:10 +0100] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2018-12-03 14:18:13 +0100] information/Application: We jumped forward in time: 30.2703 seconds
[2018-12-03 14:18:13 +0100] warning/ThreadPool: Event call took 2.97695s
[2018-12-03 14:18:13 +0100] warning/ThreadPool: Event call took 2.97716s
[2018-12-03 14:18:15 +0100] warning/ThreadPool: Event call took 4.50566s
[2018-12-03 14:18:15 +0100] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 23.9/s (1434/min 1434/5min 1434/15min);
[2018-12-03 14:18:15 +0100] information/HttpServerConnection: Request: GET /v1/objects/hosts (from [::ffff:127.0.0.1]:53666), user: root)
[2018-12-03 14:18:15 +0100] warning/ThreadPool: Event call took 4.58007s
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x000000010062944e icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:359
356 try {
357 while (ProcessMessage())
358 ; /* empty loop body */
-> 359 } catch (const std::exception& ex) {
360 Log(LogWarning, "HttpServerConnection")
361 << "Error while reading Http request: " << DiagnosticInformation(ex);
362
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x000000010062931e icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:367
364 }
365
366 /* Request finished, decide whether to explicitly close the connection. */
-> 367 if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
368 m_CurrentRequest.Headers->Get("connection") == "close") {
369 m_Stream->Shutdown();
370 close = true;
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x0000000100629345 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:368
365
366 /* Request finished, decide whether to explicitly close the connection. */
367 if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
-> 368 m_CurrentRequest.Headers->Get("connection") == "close") {
369 m_Stream->Shutdown();
370 close = true;
371 }
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x00000001006293c2 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:367
364 }
365
366 /* Request finished, decide whether to explicitly close the connection. */
-> 367 if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
368 m_CurrentRequest.Headers->Get("connection") == "close") {
369 m_Stream->Shutdown();
370 close = true;
Target 0: (icinga2) stopped.
(lldb)
[2018-12-03 14:18:22 +0100] warning/ThreadPool: Event call took 7.63664s
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x0000000100629425 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:369
366 /* Request finished, decide whether to explicitly close the connection. */
367 if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
368 m_CurrentRequest.Headers->Get("connection") == "close") {
-> 369 m_Stream->Shutdown();
370 close = true;
371 }
372 } else
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x0000000100629445 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:370
367 if (m_CurrentRequest.ProtocolVersion == HttpVersion10 ||
368 m_CurrentRequest.Headers->Get("connection") == "close") {
369 m_Stream->Shutdown();
-> 370 close = true;
371 }
372 } else
373 close = true;
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x0000000100629449 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:371
368 m_CurrentRequest.Headers->Get("connection") == "close") {
369 m_Stream->Shutdown();
370 close = true;
-> 371 }
372 } else
373 close = true;
374
Target 0: (icinga2) stopped.
(lldb) p close
(bool) $0 = true
(lldb) n
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x000000010062952a icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:372
369 m_Stream->Shutdown();
370 close = true;
371 }
-> 372 } else
373 close = true;
374
375 if (close)
Target 0: (icinga2) stopped.
(lldb)
[2018-12-03 14:18:35 +0100] warning/ThreadPool: Event call took 19.78s
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x0000000100629546 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:375
372 } else
373 close = true;
374
-> 375 if (close)
376 Disconnect();
377 }
378
Target 0: (icinga2) stopped.
(lldb)
Process 32506 stopped
* thread #12, stop reason = step over
frame #0: 0x0000000100629550 icinga2`icinga::HttpServerConnection::DataAvailableHandler(this=0x0000000104832e00) at httpserverconnection.cpp:376
373 close = true;
374
375 if (close)
-> 376 Disconnect();
377 }
378
379 void HttpServerConnection::CheckLiveness()
Target 0: (icinga2) stopped.
Actually the request body is not parsed at this point, where @swegener is 100% correct with his analysis.
Therefore closing the request here makes absolutely no sense, and I did not think about this when fixing it for 2.10.0.
Older versions of Icinga likely did not suffer from Connection: close
being incorrectly parsed, but the enqueuing of that "Corked" stuff, meaning to say that future packets where not handled properly.
With the revert of the "Corked" stuff in 2.10.2 and #6738 such state machine like connections with wait times in between function properly again.
This is a continuation from #6514.
Using the sequential curl requests with "Connection: close" header seems to work, but after firing up our API client which uses threaded job handling results in some errors again. If I don't send the "Connection: close" header, everything is fine (at least for 2.10.2, on some 2.8.x it was the exact opposite, that's why we changed the behaviour).
Here is the debug log:
Sometimes there are two entries for a disconnect with the same source port:
Log from client side:
I see some strange behaviour there:
Unfortunately the client is not opensource, but the only interaction with Icinga it does is pushing check results and get services from a host by API in worker threads (written in GO using http.Client).
Environment: