etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.73k stars 9.76k forks source link

Add an e2e tests that detects error logs during etcd bootstrap #17329

Open serathius opened 9 months ago

serathius commented 9 months ago

What would you like to be added?

Etcd should not write any error logs during normal operation. We could add a test that starts etcd cluster, does some simple operations and validates if there is any error level log written.

This could be further improved by enabling this check in all e2e tests when cluster/server is being closed. As some test intentionally inject errors, we could make it configurable option that is enabled by default and disabled in all tests we expect error to be written.

Why is this needed?

Logs are used to provide useful debug information to users, however if mislabeled they could cause user to lose trust and ignore errors. For error logs to be useful they should only be emitted where there is something wrong.

We should prevent issues like https://github.com/etcd-io/etcd/issues/17245 where a change in code started generating a large number of error logs.

Proposed in https://github.com/etcd-io/etcd/pull/17249#issuecomment-1893399469 with agreement between maintainers.

nitishfy commented 9 months ago

Hi, I'd like to take this up!

serathius commented 9 months ago

Thanks, feel free to reach out to me on slack (get an invite) if you need any help /assign @nitishfy

serathius commented 8 months ago

https://github.com/etcd-io/etcd/pull/17423 covered a single node cluster, we should also cover:

nitishfy commented 8 months ago

17423 covered a single node cluster, we should also cover:

  • 3 node cluster
  • different TLS configurations.

we can raise a new PR for it. WDYT?

serathius commented 8 months ago

SG, thanks

serathius commented 8 months ago

Also, when we are done, we should consider backporting the test.

serathius commented 6 days ago

Still remaining to be completed:

ghouscht commented 3 days ago

I think I can take this.

/assign

ghouscht commented 1 day ago

done 🙂 I think this can be closed now.

serathius commented 1 day ago

Awesome, thanks @ghouscht !

ahrtr commented 1 day ago

It looks like the test doesn't enable TLS by default. Should we also extend the test to support the case of enabling TLS?

serathius commented 1 day ago

Good point, I missed https://github.com/etcd-io/etcd/issues/17329#issuecomment-1943954845.

ghouscht commented 1 day ago

Oh I missed that as well, sorry! I'll have a look later on and will prepare another PR.

ghouscht commented 8 hours ago

Something like this https://github.com/etcd-io/etcd/pull/18819? Or did you have something different in your mind?