etcd-io / bbolt

An embedded key/value database for Go.
https://go.etcd.io/bbolt
MIT License
8.32k stars 644 forks source link

[DO NOT REVIERW] try to reproduce the db corruption issue #769

Closed ahrtr closed 1 month ago

ahrtr commented 5 months ago

Please do not review this PR.

FYI. I am trying to reproduce the db corruption issue using the following script + this PR.

#!/usr/bin/env bash

# Please run this script at the root directory of the bbolt repository
# using command something like below,
#    nohup ./reproduce_corruption.sh > test.log &

set -euo pipefail

go build ./cmd/bbolt/

minwait=100
maxwait=250
for i in {1..10000}
do
    echo
    echo "-----------------------------------"
    echo "Round $i: $(date)"

    rm -f case.log || true

    TEST_CONCURRENT_CASE_DURATION=300s go test -run TestConcurrentGenericReadAndWrite -v > case.log &
    sleep $((minwait + RANDOM % (maxwait-minwait)))

    pid=$(ps -ef | grep bbolt | grep -v grep | awk '{print $2}')
    echo "Killing ${pid}..."
    kill -9 ${pid}
    sleep 10

    echo "Checking db consistency..." 
    ./bbolt page ./bbolt.db 0
    ./bbolt page ./bbolt.db 1
    ./bbolt check ./bbolt.db
    sleep 5
done

echo "All done!"

It has been running for 4 days and 19 hours. NO any issue so far! It's still running.


-----------------------------------
Round 1: Sun  9 Jun 11:33:54 PDT 2024
Killing 385023...
Checking db consistency...
Page ID:    0
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=9>
Freelist:   <pgid=4>
HWM:        <pgid=1219>
Txn ID:     2812
Checksum:   cc65e26ed4ea7d37

Page ID:    1
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=9>
Freelist:   <pgid=12>
HWM:        <pgid=1219>
Txn ID:     2811
Checksum:   b50666a0f0774fcc

OK

-----------------------------------
Round 2: Sun  9 Jun 11:36:20 PDT 2024
Killing 385117...
Checking db consistency...
Page ID:    0
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=17>
Freelist:   <pgid=7>
HWM:        <pgid=1003>
Txn ID:     2710
Checksum:   de3e350417492bcf

Page ID:    1
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=27>
Freelist:   <pgid=28>
HWM:        <pgid=1014>
Txn ID:     2711
Checksum:   8328e60775bb0806

OK

-----------------------------------
......
-----------------------------------
Round 2193: Fri 14 Jun 06:30:09 PDT 2024

cc @fuweid @tjungblu @ivanvc @Elbehery

ahrtr commented 5 months ago

It has been running for about 18 days. NO any issue so far! It's still running.

-----------------------------------
Round 8093: Thu 27 Jun 06:58:41 PDT 2024
ivanvc commented 5 months ago

I left it running on two machines, too. And so far, neither has presented any issues.

-----------------------------------
Round 5678: Thu Jun 27 06:10:26 PM UTC 2024
-----------------------------------
Round 5692: Thu Jun 27 06:11:30 PM UTC 2024
ahrtr commented 5 months ago

I left it running on two machines, too. And so far, neither has presented any issues.

thx.

ahrtr commented 4 months ago

No any issue after 22 days' continuous running.


-----------------------------------
Round 1: Sun  9 Jun 11:33:54 PDT 2024
Killing 385023...
Checking db consistency...
Page ID:    0
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=9>
Freelist:   <pgid=4>
HWM:        <pgid=1219>
Txn ID:     2812
Checksum:   cc65e26ed4ea7d37

Page ID:    1
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=9>
Freelist:   <pgid=12>
HWM:        <pgid=1219>
Txn ID:     2811
Checksum:   b50666a0f0774fcc

OK

......

-----------------------------------
Round 10000: Mon  1 Jul 11:05:39 PDT 2024
Killing 1242747...
Checking db consistency...
Page ID:    0
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=33>
Freelist:   <pgid=66>
HWM:        <pgid=1521>
Txn ID:     3988
Checksum:   9db4db789a9ef0f7

Page ID:    1
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=65>
Freelist:   <pgid=4>
HWM:        <pgid=1513>
Txn ID:     3987
Checksum:   0a6f95bfd5ea1742

OK
All done!
k8s-ci-robot commented 1 month ago

@ahrtr: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-bbolt-robustness-arm64 2735c9f3d40dbe0d3602441018c1bb9806a693f4 link true /test pull-bbolt-robustness-arm64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).