etcd-io / bbolt

An embedded key/value database for Go.
https://go.etcd.io/bbolt
MIT License
8.31k stars 646 forks source link

using primary and slave DBs to solve the panic problem caused by DB c… #830

Closed yingchunliu-zte closed 2 months ago

yingchunliu-zte commented 2 months ago

When the system is powered off and reset, containerd experiences a panic: panic: freepages: failed to get all reachable pages (key[0]=(hex) on leaf page(1229) needs to be < than key of the next element in ancestor(hex). Pages stack: [1974,1229])

The main steps for using primary and slave DBs to solve the panic problem caused by DB conflicts are as follows:

  1. When opening db, if successful, it will be opened as master db and copied as slave db. Otherwise, open slave db as master db and copy master db as slave db.
  2. Open slave db as DB.slave
  3. Add slave members to DB, Tx, and Bucket objects and reload their write methods: after the main object operation is successful, the slave object performs the operation

When the system is powered off, if writing to master db causes a master db conflict, containerd will use slave db after panic. If you are writing a slave db, containerd will create a new slave db.

kind: bug

k8s-ci-robot commented 2 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: yingchunliu-zte Once this PR has been reviewed and has the lgtm label, please assign ptabor for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/etcd-io/bbolt/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
tjungblu commented 2 months ago

Thanks @yingchunliu-zte - that approach already exists (in smaller scale) internally via the meta pages: https://github.com/etcd-io/bbolt/blob/main/db.go#L1123-L1144

You can see in the recent investigation by @ahrtr that simply rolling back the page helps in recovering: https://github.com/etcd-io/bbolt/issues/778#issuecomment-2329016959

There's another scenario where we believe some pages didn't persist properly during power-down events on virtualized filesystems. I don't think that copying an entire bucket or database file is helping here at all.

yingchunliu-zte commented 2 months ago

Thanks @tjungblu

Commit is not an atomic operation, and when the system loses power, the db disk drop may be incomplete. Every write operation here will be written to the redundant backup database. When committing, there is always a good database for automatic recovery to avoid constant panic.

ahrtr commented 2 months ago

Yes, basically I agree with @tjungblu . Please also refer to https://github.com/ahrtr/etcd-issues/blob/master/docs/cncf_storage_tag_etcd.md#storage-boltdb-feature

But on other hand, It's totally up to applications to do whatever higher level protection (e.g master-slave) they want. But it may not be an easy task.

From bbolt perspective, there indeed are some long standing data corruption issue. One of the possible reasons could be due to filesystem as mentioned in https://github.com/etcd-io/bbolt/issues/778#issuecomment-2329016959. But it's also possible that there are some bugs in the freelist management, refer to https://github.com/etcd-io/bbolt/issues/789. I am open to any thoughts on how to resolve such data corruption issues.

ahrtr commented 2 months ago

Commit is not an atomic operation

It's atomic. Please refer to the link in my previous comment.

To be clearer, we won't accept this PR but thanks anyway.

Please feel free to raise a topic in discussions if you want.

tjungblu commented 2 months ago

Commit is not an atomic operation

Every write operation here will be written to the redundant backup database.

If commit were not atomic, you now have a two-phase commit issue without an actual commit you need solve. I hope you see where this is going :)

Please feel free to raise a topic in discussions if you want.

+1, happy to brainstorm this further along

yingchunliu-zte commented 2 months ago

@tjungblu @ahrtr The commit operation I am referring to is non atomic, which means that some pages were successfully dropped while others were not. In this case, master slave db may play a role.