etcd-io / website

etcd.io
https://etcd.io
Other
146 stars 295 forks source link

Etcd snapshot save/restore documentation needs enhancement #671

Open PenelopeFudd opened 1 year ago

PenelopeFudd commented 1 year ago

I have a three-node etcd cluster (used with Patroni), and one node decided to break.

Read through https://etcd.io/docs/v3.5/op-guide/recovery/ and wasn't able to get the node to work.

Errors:

2023-04-21 14:41:13.820783 I | raft: 7088d2ed19f2af13 became follower at term 97778
2023-04-21 14:41:13.820946 C | raft: tocommit(28704400) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
panic: tocommit(28704400) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?

Did a bunch of googling, found lots related to Kubernetes, found a few for standalone clusters and/or older versions.

Finally figured out my problems (etcd-related, anyway):

Environment: Ubuntu 22.04, etcd 3.2.26

Thanks for a great program!

jmhbnz commented 1 year ago

Thanks for raising this detailed report @PenelopeFudd. Reviewing your notes I agree there are some areas we could expand the recovery documentation, namely ensure people are aware of the directory creation and ownership requirements.

I also like the suggestion on the more meaningful error message for snapshot restore permission issues, that would need to be completed in the main etcd repo. However I do note you're using etcd 3.2.26 which is quite old, we would need to verify if that error has already been improved in later releases.

spzala commented 1 year ago

Thanks for reporting @PenelopeFudd and +1 @jmhbnz - v3.2 is not supported so it would be great if you can try v3.5 (the doc you are using) or main branch.