Closed yingchunliu-zte closed 2 months ago
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: yingchunliu-zte Once this PR has been reviewed and has the lgtm label, please assign ptabor for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Thanks @yingchunliu-zte - that approach already exists (in smaller scale) internally via the meta pages: https://github.com/etcd-io/bbolt/blob/main/db.go#L1123-L1144
You can see in the recent investigation by @ahrtr that simply rolling back the page helps in recovering: https://github.com/etcd-io/bbolt/issues/778#issuecomment-2329016959
There's another scenario where we believe some pages didn't persist properly during power-down events on virtualized filesystems. I don't think that copying an entire bucket or database file is helping here at all.
Thanks @tjungblu
Commit is not an atomic operation, and when the system loses power, the db disk drop may be incomplete. Every write operation here will be written to the redundant backup database. When committing, there is always a good database for automatic recovery to avoid constant panic.
Yes, basically I agree with @tjungblu . Please also refer to https://github.com/ahrtr/etcd-issues/blob/master/docs/cncf_storage_tag_etcd.md#storage-boltdb-feature
But on other hand, It's totally up to applications to do whatever higher level protection (e.g master-slave) they want. But it may not be an easy task.
From bbolt perspective, there indeed are some long standing data corruption issue. One of the possible reasons could be due to filesystem as mentioned in https://github.com/etcd-io/bbolt/issues/778#issuecomment-2329016959. But it's also possible that there are some bugs in the freelist management, refer to https://github.com/etcd-io/bbolt/issues/789. I am open to any thoughts on how to resolve such data corruption issues.
Commit is not an atomic operation
It's atomic. Please refer to the link in my previous comment.
To be clearer, we won't accept this PR but thanks anyway.
Please feel free to raise a topic in discussions if you want.
Commit is not an atomic operation
Every write operation here will be written to the redundant backup database.
If commit were not atomic, you now have a two-phase commit issue without an actual commit you need solve. I hope you see where this is going :)
Please feel free to raise a topic in discussions if you want.
+1, happy to brainstorm this further along
@tjungblu @ahrtr The commit operation I am referring to is non atomic, which means that some pages were successfully dropped while others were not. In this case, master slave db may play a role.
When the system is powered off and reset, containerd experiences a panic: panic: freepages: failed to get all reachable pages (key[0]=(hex) on leaf page(1229) needs to be < than key of the next element in ancestor(hex). Pages stack: [1974,1229])
The main steps for using primary and slave DBs to solve the panic problem caused by DB conflicts are as follows:
When the system is powered off, if writing to master db causes a master db conflict, containerd will use slave db after panic. If you are writing a slave db, containerd will create a new slave db.
kind: bug