Open aggarwalShivani opened 1 year ago
Thanks for reporting! I will check it.
Pls let us know if there're any additional configs in syslog plugin that could help us achieve retry/buffering properly in this case too.
I don't know how effective it is, but what about keep_alive
, keep_alive_idle
, keep_alive_cnt
, keep_alive_intvl
?
Hi @daipom Yes, I have already tried adding the following parameters, but that didn't help.
timeout 10
timeout_exception true
keep_alive true
keep_alive_cnt 1
keep_alive_intvl 5
keep_alive_idle 5
Hi @daipom Latest update: Observation - On further check, we found that after around ~15minutes (of syslog being down), the connectivity error is detected in fluentd logs and it then tries to re-create the connection with syslog endpoint. It then starts retaining the chunks in buffer as the destination is not reachable. Once connection is successful, it retries sending the data present in its buffer at that time. But yes, with this, the concern is the data generated during this 15min window is lost as it was not buffered by the plugin.
Analysis - We believe this 15min timeout is related to tcp communication issue. It could be a result of the default value of tcp_retries2 (on linux) which is 15. _TCP retransmits an unacknowledged packet up to tcpretries2, after that, it gives up and notifies the application that the connection is lost. For ref - https://tech.instacart.com/the-vanishing-thread-and-postgresql-tcp-connection-parameters-93afc0e1208c
To solve this, we tried setting the TCP socket option TCP_USER_TIMEOUT at the time of socket creation with syslog so that the timeouts happen sooner. The change is made (to set the socker option) in the dependent ruby gem i.e. in tcp_sender.rb file in the remote_syslog_sender gem after the socket creation. https://github.com/reproio/remote_syslog_sender/blob/master/lib/remote_syslog_sender/tcp_sender.rb#L57
@tcp_socket.setsockopt(Socket::IPPROTO_TCP , Socket::TCP_USER_TIMEOUT, 6000);
With this change, we observed that in case of syslog going down, after 6 seconds, the connection loss was detected by fluentd and buffering/retry worked as expected.
Does this look like the right approach to solve this issue? Can we expect a fix in the plugin on these lines?
Thanks for finding it out!
I also find an interesting behavior of TCPSocket.
After stopping a TCP server, a client can send the data once without error. Of course, this data does NOT reach the server. (I can reproduce this when I launch a remote TCP server.)
Server: I used Fluentd as a TCP server with the following config
<source>
@type tcp
tag test
<parse>
@type none
</parse>
<transport tcp>
linger_timeout 5
</transport>
</source>
<match test.**>
@type stdout
</match>
Client: irb
require "socket"
socket = TCPSocket.new("IP", 5170)
# The server is alive.
socket.write("fuga\n")
=> 5
# After stopping the server.
socket.write("fuga\n")
=> 5 # Surprisingly, the first one succeeds!!
socket.write("fuga\n")
in `write': Broken pipe (Errno::EPIPE)
from (irb):43:in `<main>'
from /home/daipom/.rbenv/versions/3.2.0/lib/ruby/gems/3.2.0/gems/irb-1.6.2/exe/irb:11:in `<top (required)>'
from /home/daipom/.rbenv/versions/3.2.0/bin/irb:25:in `load'
from /home/daipom/.rbenv/versions/3.2.0/bin/irb:25:in `<main>'
If I remove the setting linger_timeout
as follows, an error occurs in the first sending after stopping the server.
<source>
@type tcp
tag test
<parse>
@type none
</parse>
</source>
require "socket"
socket = TCPSocket.new("IP", 5170)
# The server is alive.
socket.write("fuga\n")
=> 5
# After stopping the server.
socket.write("fuga\n")
in `write': Connection reset by peer (Errno::ECONNRESET)
from (irb):31:in `<main>'
from /home/daipom/.rbenv/versions/3.2.0/lib/ruby/gems/3.2.0/gems/irb-1.6.2/exe/irb:11:in `<top (required)>'
from /home/daipom/.rbenv/versions/3.2.0/bin/irb:25:in `load'
from /home/daipom/.rbenv/versions/3.2.0/bin/irb:25:in `<main>'
This difference is related to SO_LINGER
option.
If this value is larger than zero, FIN is sent on closing.
If this value is zero, RST is sent on closing. (However, Windows always send FIN regardless of this option)
If FIN is sent, the connection remains for a while, and the other side can still send the data.
I'm thinking this issue may have something to do with this specification of TCP.
The problem that sending data to a closed TCP socket doesn't fail seems to be a fairly well-known problem:
https://stackoverflow.com/questions/11436013/writing-to-a-closed-local-tcp-socket-not-failing
This is what I found so far.
Thanks for the information about TCP_USER_TIMEOUT
!
To solve this, we tried setting the TCP socket option TCP_USER_TIMEOUT at the time of socket creation with syslog so that the timeouts happen sooner. With this change, we observed that in case of syslog going down, after 6 seconds, the connection loss was detected by fluentd and buffering/retry worked as expected. Does this look like the right approach to solve this issue? Can we expect a fix in the plugin on these lines?
Certainly, this could be a valid solution. But I think the ideal solution would be that writing to a closed socket should be a firm error. Please let me look into this a little more.
I now understand the mechanics of this problem.
There are 2 points we need to consider.
TCP_USER_TIMEOUT
as @aggarwalShivani says. The client can detect the server down by it not returning ACK for a certain period of time i.e. TCP_USER_TIMEOUT
.@aggarwalShivani Could you please confirm the following two points?
tcp.srcport == 514 || tcp.dstport == 514
.TCP_USER_TIMEOUT
, is at least one record lost?Hi @daipom Apologies for the delayed reply
Response when syslog server goes down a) 0 replicas - When all syslog pods are deleted i.e. there are no pods(endpoints) associated with the k8s service of syslog server, and if TCP communication is attempted to the syslog service, I see the following response in tcpdump -> The request gets routed to kube-dns service to resolve the dns of syslog svc. Later, it responds with port unreachable error. It does not respond with FIN.
08:35:53.461686 IP k8s-4-control-worker-edge-1 > k8s-4-control-worker-edge-1: ICMP rsyslog.shiv.svc.cluster.local tcp port facsys-ntp unreachable, length 60
b) syslog process is not running - However, when the syslog pod is running, but inside the pod, syslog process is killed, it responds with connection reset RST when there's an attempt to contact the syslog service.
19:42:43.993059 IP k8s-4-control-worker-edge-1.9136 > 192-168-165-1.rsyslog.shiv.svc.cluster.local.shell: Flags [S], seq 2061445071, win 26880, options [mss 8960,nop,nop,sackOK,nop,wscale 7], length 0
19:42:43.993076 IP 192-168-165-1.rsyslog.shiv.svc.cluster.local.shell > k8s-4-control-worker-edge-1.9136: Flags [R.], seq 0, ack 2061445072, win 0, length 0
So perhaps that is general k8s behaviour, if there is no endpoint for a service, kube-dns will be unable to resolve it and it would respond with port unreachable.
Sorry for my delay, and thanks so much for your reply. I have not been able to take the time for this. I would like to examine this issue some more.
Hi @daipom,
Have you able to conclude on the alternative approach to fix this issue or is it okay to go ahead with the approach of "TCP_USER_TIMEOUT" ?
Version details:
Background:
Issue: Fluentd successfully connects to the configured syslog endpoint and keeps pushing the records as per the flush interval. But when the k8s service of syslog server goes down (i.e. when the syslog pod gets deleted or goes down to 0/1), fluentd does not detect any failures in connection to syslog. It also keeps flushing all the chunks from the file buffer and does not retain anything in buffer - inspite of the destination being unreachable. Pls see that the syslog service is unreachable (as seen by other clients like logger, curl)
Fluentd logs: there are no errors in fluentd. New chunks keep getting created and keep getting cleared from the file buffer location. Trace level logging is enabled.
So why are the chunks getting flushed from the buffer when the destination is unreachable?
Observation: - 1) An interesting observation is - When syslog server is not running as a k8s pod, and is running as a standalone service on linux (i.e. managed through systemctl), when the service is stopped (using systemctl stop rsyslog), we immediately see the following error in fluentd logs when it tries to flush the next chunk from its buffer to the syslog endpoint. As the flush fails due to connectivity, it retains the chunk in the file buffer and keeps retrying flush (as per the configuration).
Later when the rsyslog service is restarted and up again, it connects and successfully pushes all buffered chunks to the destination without any loss of data. For ex.
2023-03-21 10:23:24 +0000 [warn]: #0 retry succeeded. chunk_id="5f7664fef1ec6c112220332f2732de46"
2) For syslog running over kubernetes, we see that fluentd detects if there is a connectivity issue with syslog initially when fluentd starts running and there retry/buffering works fine as expected. But once fluentd successfully establishes a connection with syslog, and then if syslog destination goes unreachable, fluentd fails to detect connection errors.Concern: Why is the loss of connectivity not identified in case of rsyslog server running as a kubernetes service? Pls let us know if there're any additional configs in syslog plugin that could help us achieve retry/buffering properly in this case too.