Panic: index out of range[0] with length 0 at v3.5.11 go.etcd.io/etcd/client/v3/client.go:302

hanxuejian commented 5 months ago

Bug report criteria

[X] This bug report is not security related, security issues should be disclosed privately via etcd maintainers.
[X] This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
[X] You have read the etcd bug reporting guidelines.
[X] Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.

What happened?

We have a test scenario to reset all etcd services at the same time. In this test scenario, there is a low probability that the panic stack occurs on the etcd client.

The panic stack is as follows:

panic: runtiome error: index out of range[0] with length 0 goroutine 195602 [running]: go.etcd.io/etcd/client/v3.(*Client).dial(***) go.etcd.io/etcd/client/v3/client.go:302

What did you expect to happen?

Panic should not occur. The code can be hardened. The null check may be performed on the input parameter before SetEndpoints is executed.

How can we reproduce it (as minimally and precisely as possible)?

I do not know the specific scenario where the panic occurs. However, according to the code analysis, when the autoSync method in etcd/client/v3/client.go is executed periodically, if MemberList returns a null value, Endpoints may be set to null.

Anything else we need to know?

No response

Etcd version (please run commands below)

v3.5.11

Etcd configuration (command line flags or environment variables)

# paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

```console $ etcdctl member list -w table # paste output here $ etcdctl --endpoints= endpoint status -w table # paste output here ```

Relevant log output

No response

ahrtr commented 5 months ago

We have a test scenario to reset all etcd services at the same time

Can you explain how do you "reset all etcd services"?

hanxuejian commented 4 months ago

In my test environment, there are three etcd nodes. I use the kill -9 command to kill three etcd processes at the same time. The three processes can be restarted normally and form a cluster. However, there is a low probability that the client panics.

ahrtr commented 4 months ago

In my test environment, there are three etcd nodes. I use the kill -9 command to kill three etcd processes at the same time. The three processes can be restarted normally and form a cluster. However, there is a low probability that the client panics.

thx for the feedback, but it's weird. The endpoints shouldn't be empty. Please read https://github.com/etcd-io/etcd/pull/18220

Did you ever change the source code or manually reset/clear the config.Endpoints? Please try to create an e2e test to reproduce this issue.

jmhbnz commented 4 months ago

Hey @hanxuejian - Following up on this bug, any feedback on question from @ahrtr above and did you have a clear reproduce script or test to help us reproduce this at our end?

etcd-io / etcd