apache / hertzbeat

Apache HertzBeat(incubating) is a real-time monitoring system with agentless, performance cluster, prometheus-compatible, custom monitoring and status page building capabilities.
https://hertzbeat.apache.org/
Apache License 2.0
5.46k stars 947 forks source link

[improve] modify ssh client common config #2403

Closed Aias00 closed 1 month ago

Aias00 commented 1 month ago

What's changed?

modify ssh client common properties. Change HEARTBEAT_REPLY_WAIT to HEARTBEAT_NO_REPLY_MAX see issue #2397

HEARTBEAT_NO_REPLY_MAX property: The meaning of HEARTBEAT_NO_REPLY_MAX: This property defines the maximum number of consecutive heartbeat responses that can be lost before the SSH client considers the connection to be disconnected. Specific scenario example: Let's assume we have the following configuration:

HEARTBEAT_INTERVAL is set to 10 seconds (10,000 milliseconds) HEARTBEAT_NO_REPLY_MAX is set to 3

Scenario description:

The SSH client sends a heartbeat request to the server every 10 seconds. Under normal circumstances:

The client sends a heartbeat The server responds to the heartbeat The connection is considered active

Now, let's assume some network issues occur:

0 seconds: The client sends the first heartbeat, no response received 10 seconds: The client sends the second heartbeat, still no response 20 seconds: The client sends the third heartbeat, again no response 30 seconds: The client prepares to send the fourth heartbeat

At this point, because there have been 3 consecutive heartbeats without a response, reaching the HEARTBEAT_NO_REPLY_MAX value:

The SSH client will consider the connection to be disconnected The client may attempt to re-establish the connection or notify the application layer that the connection has been lost

If the network recovers to normal after sending the third heartbeat:

The server responds to the third heartbeat The counter will be reset The connection continues to remain active

The benefits of this mechanism are:

It can promptly detect network issues or server unresponsiveness. It provides a degree of fault tolerance, not immediately disconnecting due to occasional network jitters. Parameters can be adjusted based on network environment and application requirements to balance quick response and connection stability.

Checklist

Add or update API

Aias00 commented 1 month ago

set the HEARTBEAT_NO_REPLY_MAX to 30 now, means that : The SSH client will consider the connection to be disconnected if 30 consecutive heartbeats without a response