etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
46.81k stars 9.65k forks source link

Reduce log spam on missing member #18238

Open davhdavh opened 1 week ago

davhdavh commented 1 week ago

What would you like to be added?

{"level":"warn","ts":"2024-06-27T07:54:41.820665Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"e91c799c9231b7d2","from":"e91c799c9231b7d2","remote-peer-id":"d17db337c2371a6d","remote-peer-name":"pipeline","remote-peer-active":false}
{"level":"warn","ts":"2024-06-27T07:54:41.831868Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"e91c799c9231b7d2","from":"e91c799c9231b7d2","remote-peer-id":"d17db337c2371a6d","remote-peer-name":"pipeline","remote-peer-active":false}
{"level":"warn","ts":"2024-06-27T07:54:41.831966Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"e91c799c9231b7d2","from":"e91c799c9231b7d2","remote-peer-id":"d17db337c2371a6d","remote-peer-name":"pipeline","remote-peer-active":false}
{"level":"warn","ts":"2024-06-27T07:54:41.931594Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"e91c799c9231b7d2","from":"e91c799c9231b7d2","remote-peer-id":"d17db337c2371a6d","remote-peer-name":"pipeline","remote-peer-active":false}
{"level":"warn","ts":"2024-06-27T07:54:41.981508Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"e91c799c9231b7d2","from":"e91c799c9231b7d2","remote-peer-id":"d17db337c2371a6d","remote-peer-name":"pipeline","remote-peer-active":false}
{"level":"warn","ts":"2024-06-27T07:54:42.130708Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"e91c799c9231b7d2","from":"e91c799c9231b7d2","remote-peer-id":"d17db337c2371a6d","remote-peer-name":"pipeline","remote-peer-active":false}
{"level":"warn","ts":"2024-06-27T07:54:42.13223Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"e91c799c9231b7d2","from":"e91c799c9231b7d2","remote-peer-id":"d17db337c2371a6d","remote-peer-name":"pipeline","remote-peer-active":false}

im fairly certain that giving me the exact same log message hundreds of times per seconds provide no actual benefit.

Why is this needed?

because having to write 10s of megabytes per min of log on every single control-plane while a member is temporarily missing is insane.

serathius commented 1 week ago

Hmm, guessing without looking up documentation, do we have command lines for configuring log sampling in zap?

jmhbnz commented 1 week ago

Hmm, guessing without looking up documentation, do we have command lines for configuring log sampling in zap?

Unless I am missing something there is nothing in server logging config currently for zap sampling rates: https://etcd.io/docs/v3.6/op-guide/configuration/#logging

--logger 'zap'
  Currently only supports 'zap' for structured logging.
--log-outputs 'default'
  Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd, or list of comma separated output targets.
--log-level 'info'
  Configures log level. Only supports debug, info, warn, error, panic, or fatal.
--enable-log-rotation 'false'
  Enable log rotation of a single log-outputs file target.
--log-rotation-config-json '{"maxsize": 100, "maxage": 0, "maxbackups": 0, "localtime": false, "compress": false}'
  Configures log rotation if enabled with a JSON logger config. MaxSize(MB), MaxAge(days,0=no limit), MaxBackups(0=no limit), LocalTime(use computers local time), Compress(gzip)".

We could consider adding https://pkg.go.dev/go.uber.org/zap#SamplingConfig for rafthttp?

serathius commented 1 week ago

~ I would be careful about overcomplicating logging, my question came from idea whether it can be solved now. If we have a problem with a single log line, we should look into it first, before we start supporting a new feature.