etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.73k stars 9.76k forks source link

Receiving error logs while trying to perform etcd operations #17424

Open nitishfy opened 8 months ago

nitishfy commented 8 months ago

Bug report criteria

What happened?

While trying to write test case for the #17329, I encountered that there were a lot of error logs that were present. Logs are used to provide useful debug information to users, however, if mislabeled they could cause user to lose trust and ignore errors. For error logs to be useful they should only be emitted where there is something wrong.

We should prevent issues like https://github.com/etcd-io/etcd/issues/17245 where a change in code started generating a large number of error logs.

Proposed in https://github.com/etcd-io/etcd/pull/17249#issuecomment-1893399469 with agreement between maintainers.

What did you expect to happen?

In an ideal scenario, there should be no error logs since everything is running perfectly well.

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

No response

Etcd version (please run commands below)

```console $ etcd --version etcd Version: 3.4.27 Git SHA: c92fb80f3 Go Version: go1.19.10 Go OS/Arch: linux/amd64 $ etcdctl version etcdctl version: 3.4.27 API version: 3.4 ```

Etcd configuration (command line flags or environment variables)

# paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

```console $ etcdctl member list -w table # paste output here $ etcdctl --endpoints= endpoint status -w table # paste output here ```

Relevant log output

56 = {string} "{"level":"info","ts":"2024-02-13T23:12:29.872018+0530","caller":"osutil/interrupt_unix.go:65","msg":"received signal; shutting down","signal":"terminated"}\r\n"
57 = {string} "{"level":"info","ts":"2024-02-13T23:12:29.872078+0530","caller":"embed/etcd.go:393","msg":"closing etcd server","name":"TestErrorLogs-test-0","data-dir":"/tmp/TestErrorLogs875755182/002","advertise-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"]}\r\n"
58 = {string} "{"level":"warn","ts":"2024-02-13T23:12:29.87219+0530","caller":"embed/serve.go:161","msg":"stopping insecure grpc server due to error","error":"accept tcp 127.0.0.1:20000: use of closed network connection"}\r\n"
59 = {string} "{"level":"error","ts":"2024-02-13T23:12:29.872322+0530","caller":"embed/etcd.go:857","msg":"setting up serving from embedded etcd failed.","error":"http: Server closed","stacktrace":"go.etcd.io/etcd/server/v3/embed.(*Etcd).errHandler\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:857\ngo.etcd.io/etcd/server/v3/embed.(*serveCtx).serve.func3\n\tgo.etcd.io/etcd/server/v3/embed/serve.go:176"}\r\n"
60 = {string} "{"level":"warn","ts":"2024-02-13T23:12:29.872357+0530","caller":"embed/serve.go:163","msg":"stopped insecure grpc server due to error","error":"accept tcp 127.0.0.1:20000: use of closed network connection"}\r\n"
61 = {string} "{"level":"error","ts":"2024-02-13T23:12:29.87243+0530","caller":"embed/etcd.go:857","msg":"setting up serving from embedded etcd failed.","error":"accept tcp 127.0.0.1:20000: use of closed network connection","stacktrace":"go.etcd.io/etcd/server/v3/embed.(*Etcd).errHandler\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:857\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).serveClients.func1\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:770"}\r\n"
62 = {string} "{"level":"info","ts":"2024-02-13T23:12:29.872478+0530","caller":"etcdserver/server.go:1251","msg":"skipped leadership transfer for single voting member cluster","local-member-id":"ca50e9357181d758","current-leader-member-id":"ca50e9357181d758"}\r\n"
63 = {string} "{"level":"info","ts":"2024-02-13T23:12:29.872542+0530","caller":"etcdserver/server.go:2247","msg":"server has stopped; stopping storage version's monitor"}\r\n"
64 = {string} "{"level":"info","ts":"2024-02-13T23:12:29.872555+0530","caller":"etcdserver/server.go:2224","msg":"server has stopped; stopping cluster version's monitor"}\r\n"
65 = {string} "{"level":"info","ts":"2024-02-13T23:12:29.886106+0530","caller":"embed/etcd.go:587","msg":"stopping serving peer traffic","address":"127.0.0.1:20001"}\r\n"
66 = {string} "{"level":"error","ts":"2024-02-13T23:12:29.886214+0530","caller":"embed/etcd.go:857","msg":"setting up serving from embedded etcd failed.","error":"accept tcp 127.0.0.1:20001: use of closed network connection","stacktrace":"go.etcd.io/etcd/server/v3/embed.(*Etcd).errHandler\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:857\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).servePeers.func3\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:609"}\r\n"
67 = {string} "{"level":"info","ts":"2024-02-13T23:12:29.886309+0530","caller":"embed/etcd.go:592","msg":"stopped serving peer traffic","address":"127.0.0.1:20001"}\r\n"
ahrtr commented 8 months ago

Thanks for raising this followup https://github.com/etcd-io/etcd/pull/17423#discussion_r1489264573

jmhbnz commented 2 months ago

Related to: https://github.com/etcd-io/etcd/issues/17329

ghouscht commented 1 day ago

There was a discussion related to the setting up serving from embedded etcd failed. error in one of the PRs I made. Linking the discussion here for future reference: https://github.com/etcd-io/etcd/pull/18806#discussion_r1822106873

serathius commented 1 day ago

Think we can explore idea by @ghouscht to not log some errors that are expected during shutdown presented in https://github.com/etcd-io/etcd/pull/18806#discussion_r1822155553. For example ErrServerClosed implies that server "Shutdown" or "Close" was called, meaning it was expected https://cs.opensource.google/go/go/+/refs/tags/go1.23.2:src/net/http/server.go;l=3286-3288;drc=773767def0e0f29584a69bd760430167b7479d7d.

ahrtr commented 1 day ago

For example ErrServerClosed implies that server "Shutdown" or "Close" was called, meaning it was expected

The problem is it's uncertain why the server or connection is closed. It might be normal or abnormal cases. It'd be better to log a message if err != nil, at least we should log a warning message for such cases.

serathius commented 23 hours ago

I think we can read the code and follow why Shutdown was called.