etcd-io / bbolt

An embedded key/value database for Go.
https://go.etcd.io/bbolt
MIT License
8.31k stars 646 forks source link

Find new memory pages when node drops disk #834

Closed yingchunliu-zte closed 2 months ago

yingchunliu-zte commented 2 months ago

When the system is powered off and reset, containerd experiences a panic: panic: freepages: failed to get all reachable pages (key[0]=(hex) on leaf page(1229) needs to be < than key of the next element in ancestor(hex). Pages stack: [1974,1229])

Every time a commit is made, the essence is that all nodes on the paths involved in the operation (from tree roots to tree leaves) will drop to the disk. If a new memory page is always found when dropping to the disk, and a conflict occurs due to power failure, the tree roots in the two metas must be healthy because the commit drops to the new memory page.

The node involved in the commit operation collects the original pgid and calls tx.db.freelist after the commit Free。

This always allows for the recovery of database files through meta pages, such as "./bbolt surgery revert-meta-page ./db.db --output ./new.db".

k8s-ci-robot commented 2 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yingchunliu-zte Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/etcd-io/bbolt/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
yingchunliu-zte commented 2 months ago

When the system is powered off and reset, containerd experiences a panic: panic: freepages: failed to get all reachable pages (key[0]=(hex) on leaf page(1229) needs to be < than key of the next element in ancestor(hex). Pages stack: [1974,1229])

Every time a commit is made, the essence is that all nodes on the paths involved in the operation (from tree roots to tree leaves) will drop to the disk. If a new memory page is always found when dropping to the disk, and a conflict occurs due to power failure, the tree roots in the two metas must be healthy because the commit drops to the new memory page.

The node involved in the commit operation collects the original pgid and calls tx.db.freelist after the commit Free。

This always allows for the recovery of database files through meta pages, such as "./bbolt surgery revert-meta-page ./db.db --output ./new.db".