Open karthiknagraj opened 2 months ago
Hi, Can anyone please help me out with the issue that I am facing ?
Looks like we need more information to investigate this.
out_forward
plugin?out_forward
)?@daipom Thanks for responding. The above Out_forward configuration is on the client side. Server side uses in_forward to receive it. When the server sends reset packet to forcefully close the connection, Sockets on client-side(fluentd) are going into CLOSE_WAIT states.
I see the below logs along with the logs that I have already shared.
2024-09-09 16:38:08 +0000 [debug]: #7 connect new socket #<struct Fluent::Plugin::ForwardOutput::SocketCache::TimedSocket timeout=2024-09-09 16:43:08.309568639 +0000, key=["11.25.252.222", <PORT>, "server.com"], sock=#<Fluent::PluginHelper::Socket::WrappedSocket::TCP:fd 91, AF_INET, 192.155.22.114, 46508>>
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `initialize'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `new'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `socket_create_tcp'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:410:in `create_transfer_socket'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/connection_manager.rb:46:in `call'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/connection_manager.rb:46:in `connect'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:807:in `connect'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:676:in `send_data'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:365:in `block in write'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:46:in `block in select_healthy_node'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `times'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `select_healthy_node'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/service_discovery/manager.rb:108:in `select_service'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/service_discovery.rb:82:in `service_discovery_select_service'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:365:in `write'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1179:in `try_flush'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:353:in `try_flush'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1500:in `flush_thread_run'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 failed to send heartbeat packet host="127.0.0.1" port=<PORT> heartbeat_type=:transport error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"127.0.0.1\" port <PORT>"
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 taking back chunk for errors. chunk="621b2d3a10bfccc53dc08c7c6d4019ff"
2024-09-09 17:09:59 +0000 [warn]: #2 failed to flush the buffer. retry_times=0 next_retry_time=2024-09-09 17:10:01 +0000 chunk="621b2d3a10bfccc53dc08c7c6d4019ff" error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"127.0.0.1\" port <PORT>"
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `initialize'
2024-09-09 17:09:59 +0000 [debug]: #2 taking back chunk for errors. chunk="621b2d3a10c9e08c09e407b1216c81b2"
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `new'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `socket_create_tcp'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:410:in `create_transfer_socket'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/connection_manager.rb:46:in `call'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/connection_manager.rb:46:in `connect'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:807:in `connect'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:676:in `send_data'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:365:in `block in write'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:46:in `block in select_healthy_node'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `times'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `select_healthy_node'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/service_discovery/manager.rb:108:in `select_service'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/service_discovery.rb:82:in `service_discovery_select_service'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:365:in `write'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1179:in `try_flush'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:353:in `try_flush'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1500:in `flush_thread_run'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
I am not sure if these logs will help in analyzing better. Please let me know if you are looking for some specific keyword in the logs, I can try to get the output of those logs.
Thanks! We can know how the client works from these logs. We need to investigate if this behavior is intended or not.
One more question.
When the server sends reset packet to forcefully close the connection
Why does the server send reset packet? Could some issues occur on the server side as well?
Could you attach server-side (in_forward
) logs?
@daipom, I do not have access to the server-side logs. I was just checking the output plugin documentation, If we make the tcp connection as long-lived/persistent i.e., my setting keepalive: true
, Then the fluentd doesn't handle closing of the the socket? Is my understanding correct?
https://docs.fluentd.org/plugin-helper-overview/api-plugin-helper-socket
Lines from the documentation: "The socket plugin helper does not manage the lifecycle of the socket. User must close the socket when it is no longer needed."
Could this be the reason why fluentd is not able to clean-up connections in close_wait state. I even tried by adding the below parameters to output forward plugin, Even this didn't help
send_keepalive_packet true
keepalive_timeout 300s
If we make the tcp connection as long-lived/persistent i.e., my setting keepalive: true , Then the fluentd doesn't handle closing of the the socket? Is my understanding correct?
No, out_forward
plugin handles the socket closing even if using keepalive
.
https://docs.fluentd.org/plugin-helper-overview/api-plugin-helper-socket Lines from the documentation: "The socket plugin helper does not manage the lifecycle of the socket. User must close the socket when it is no longer needed." Could this be the reason why fluentd is not able to clean-up connections in close_wait state. I even tried by adding the below parameters to output forward plugin, Even this didn't help
plugin-helper
is a feature for developers of Fluentd plugins, not for Fluentd users.
This is about implementing the plugins, not how to use them.
Here are the documents for plugin users.
@daipom, ok sure, But I don't see CLOSE_WAIT connections getting closed. Am I missing any configuration that should be added?
The out_forward
plugin has its own logic to handle the sockets, so we need to investigate the implementation.
However, the problem is that I can't reproduce this issue.
In my test, out_forward
can close the socket and re-send data to in_forward
correctly if in_forward
revives.
I found out one thing.
On non-Windows, in_forward
sends RST rather than FIN when closing, by default.
It appears that it closes the socket forcefully.
So, by default, I can't reproduce CLOSE_WAIT state sockets that remain.
However, when setting linger_timeout to in_forward
, it sends FIN when closing.
This generates some CLOSE_WAIT state sockets.
<source>
@type forward
<transport tcp>
linger_timeout 1
</transport>
</source>
However, in my testing, this was not a problem.
After in_forward
revives, out_forward
can reopen the socket and send data correctly in my testing.
So, I'm not sure what is causing this issue phenomenon...
At least, the close_wait socket should be closed after keepalive_timeout
elapses.
In my testing, they are closed afer keepalive_timeout
.
I need more information to reproduce the phenomenon...
If you can, please try a more recent version.
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days
Hi, Configuring keepalive_timeout helped me manage sockets in the CLOSE_WAIT state. However, I noticed that after setting keepalive_timeout, not only are inactive or CLOSE_WAIT sockets cleared, but all sockets are being closed. For example, with keepalive_timeout set to 300 seconds, all sockets are terminated after 300 seconds.
Expected Behavior: Since keepalive is enabled and keepalive_timeout is set to 300 seconds, the expectation is that the tcp connections stays persistent and just inactive or CLOSE_WAIT connections would be checked and closed at the 300-second mark. If all sockets are closed, it results in the loss of in-transit data, requiring new connections to be established.
Could you please advise on how I can limit the cleanup to only inactive or CLOSE_WAIT sockets?
Maybe, this issue will be solved by https://github.com/fluent/fluentd/pull/3711 So, could you please update your Fluentd version?
Thanks @Watson1978, @daipom, I tested with Fluentd version 1.16.6 and observed that the available/active sockets are not being closed, which is a good. However, I suspect there may be a bug in Fluentd's handling of CLOSE_WAIT connections. When the server closes the connection (e.g., by calling socket.close), Fluentd receives a FIN packet but does not respond with its own FIN packet to gracefully close the connection. As a result, the socket remains in a CLOSE_WAIT state on the Fluentd side. Ideally, we do not want to rely on the keepalive_timeout to manage the closure of the socket; instead, Fluentd should close the socket on its end when the server closes it on its end.
Please note, I have observed this when keepalive is set to true.
Can anyone please help me out resolve the close_wait connection. Please let me know if any support is required to reproduce the issue. We can try to reproduce the issue together. Everytime server executes socket.close() this issue is seen.
However, I suspect there may be a bug in Fluentd's handling of CLOSE_WAIT connections. When the server closes the connection (e.g., by calling socket.close), Fluentd receives a FIN packet but does not respond with its own FIN packet to gracefully close the connection. As a result, the socket remains in a CLOSE_WAIT state on the Fluentd side.
Is this reproducible with Fluentd 1.16.6?
(Does CLOSE_WAIT
socket remain after the in_forward
stop?)
@daipom, Yes. It is reproducible with Fluentd 1.16.6
Is there any update on this issue or Can anyone help be resolve this issue?
@daipom, were you able to reproduce the issue?
Describe the bug
I have fluentd deployed as a deamonset in kubernetes cluster which is sending logs using output forward plugin. I have shared the config below. In the netstat output of fluentd, I see alot of close_wait connections. How can I resolve this ?
tcp 1 0 fluentd-operator-:<port> 1.1.1.1:<port> CLOSE_WAIT
To Reproduce
This issue is seen when the remote side of the connection has closed the connection.
Expected behavior
Ideally fluentd should handle closing the connection and relive the socket.
Your Environment
Your Configuration
Your Error Log
Additional context
No response