erthink / libmdbx

One of the fastest embeddable key-value ACID database without WAL. libmdbx surpasses the legendary LMDB in terms of reliability, features and performance.
https://erthink.github.io/libmdbx/
Other
1.16k stars 111 forks source link

“Page not found” after switch from 0.10.1 to 0.10.3 #234

Closed AskAlexSharov closed 3 years ago

AskAlexSharov commented 3 years ago

After updating mdbx version from: 0.10.1 to 0.10.3 we got couple bug reports about db corruption (on 0.10.1 we had no such issues).

1-st on complex disk setup with fancy btrfs features (and i did blame for FS setup): https://github.com/ledgerwatch/erigon/issues/2730

2-nd on Ubuntu20-0, default ZFS, and single SSD (i will add link later):

 - summary: 2084096603 records, 2070785841 dups, 16672772824 key's bytes, 61997120882 data's bytes, 0 problems
Processing 'AccountHistory'...
 - key-value kind: usual-key => single-value
 - last modification txn#1320
 ! page #2287521565 beyond next-pgno
 ! mdbx_cursor_get() failed, error -30797 MDBX_PAGE_NOTFOUND: Requested page not found
 - problems: different number of entries (1)
 - summary: 37340938 records, 0 dups, 1045546264 key's bytes, 2182978255 data's bytes, 1 problems
 ! mdbx_cursor_get() failed, error -30782 MDBX_BAD_TXN: Transaction is not valid for requested operation, e.g. had errored and be must aborted, has a child, or is invalid
 - problems: different number of entries (1)
Total 5 errors are detected, elapsed 10526.937 seconds.

mdbx_chk with -0,-1,-2 returned error.

can we do something else to grab debug info?

erthink commented 3 years ago

Technically: 2287521565 > 0x7ffffFFFF, so this is wrong reference, i.e. corruption of a page which referenced the 2287521565 from.

erthink commented 3 years ago

The situation described in the https://github.com/ledgerwatch/erigon/issues/2730 is different: there is a correct reference to the page 261584464, but the page 261584464 is clearly damaged (i.e. zeroed content).

AskAlexSharov commented 3 years ago

@erthink i see. 1-st db already dropped. 2-nd not yet. Can guys do provide any useful debug info from 2-nd db?

erthink commented 3 years ago

In general, we need to understand whether this is a consequence of errors in libmdbx, or somewhere else (ZFS, disk error, etc).

There is a non-zero chance of understanding what happened by looking at the mdbx_chk -vvv output and then looking at the contents of the page referring to 2287521565.

erthink commented 3 years ago

If possible, then you should give me the DB file for analysis. But to get understanding that the problem is in libmdbx, then to fix it and check the fix are possible only by reproducible test case.

erthink commented 3 years ago
mdbx_chk v0.10.3-0-gbf603bd (2021-08-27T22:47:12+03:00, T-89059f5e31a893b0739257a1df03e3d9fcc6d74d)
Running for ./chaindata in 'read-only' mode...
   open-MADV_DONTNEED 448023139..448266240
   readahead OFF 0..448023139
 - monopolistic mode
 - current boot-id bdfe0235edeb58e6-4c1fb53364c647a1
 - pagesize 4096 (4096 system), max keysize 1980..2022, max readers 118
 - mapsize 2199023255552 (2.00 Tb)
 - dynamic datafile: 12288 (12.00 Kb) .. 2199023255552 (2.00 Tb), +2147483648 (2.00 Gb), -4096 (4.00 Kb)
 - current datafile: 1836098519040 (1.67 Tb), 448266240 pages
 - meta-0: steady txn#1351, tail
 - meta-1: steady txn#1352, stay
 - meta-2: steady txn#1353, head
 - performs check for meta-pages clashes
 - performs full check recent-txn-id with meta-pages
 - transactions: recent 1353, latter reader 1353, lag 0
Traversal b-tree by txn#1353...
 - found 'AccountChangeSet' area
 - found 'AccountHistory' area
 - found 'BlockBody' area
 - found 'BlockTransaction' area
 - found 'BlockTransactionLookup' area
 - found 'CallFromIndex' area
 - found 'CallToIndex' area
 - found 'CallTraceSet' area
 - found 'CanonicalHeader' area
 - found 'Code' area
 - found 'Config' area
 - found 'DbInfo' area
 - found 'HashedAccount' area
 - found 'HashedCodeHash' area
 - found 'HashedStorage' area
 - found 'Header' area
 - found 'HeaderNumber' area
 - found 'HeadersTotalDifficulty' area
 - found 'IncarnationMap' area
 - found 'LastBlock' area
 - found 'LastHeader' area
 - found 'LogAddressIndex' area
 - found 'LogTopicIndex' area
 - found 'Migration' area
 - found 'PlainCodeHash' area
 - found 'PlainState' area
 - found 'Receipt' area
 - found 'Sequence' area
 - found 'StorageChangeSet' area
 - found 'StorageHistory' area
 - found 'SyncStage' area
 - found 'TransactionLog' area
 - found 'TrieAccount' area
 - found 'TrieStorage' area
 - found 'TxSender' area
 - pages: walked 444845737, left/unused 3177402
     @GC: subtotal 3110, branch 1, large 324, leaf 2
     @MAIN: subtotal 1, leaf 1
     @META: subtotal 3
     AccountChangeSet: subtotal 32814947, branch 7479361, leaf 25335586 (usual 25335586, sub-dupsort 3162129, dupfixed 0, sub-dupfixed 0)
     AccountHistory: subtotal 7691720, branch 122481, leaf 7569239
     BlockBody: subtotal 367642, branch 4540, leaf 363102
     BlockTransaction: subtotal 123324746, branch 441213, large 18000521, leaf 98831039
     BlockTransactionLookup: subtotal 27094861, branch 522943, leaf 26571918
     CallFromIndex: subtotal 6089310, branch 103553, leaf 5985757
     CallToIndex: subtotal 6969003, branch 113869, leaf 6855134
     CallTraceSet: subtotal 31542779, branch 7262577, leaf 24280202 (usual 24280202, sub-dupsort 3286475, dupfixed 0, sub-dupfixed 0)
     CanonicalHeader: subtotal 326424, branch 1783, leaf 324641
     Code: subtotal 819876, branch 686, large 335629, leaf 46745
     Config: subtotal 1, leaf 1
     DbInfo: subtotal 1, leaf 1
     HashedAccount: subtotal 2790058, branch 43551, leaf 2746507
     HashedCodeHash: subtotal 700342, branch 13559, leaf 686783
     HashedStorage: subtotal 10955957, branch 277349, leaf 10678608 (usual 10678608, sub-dupsort 4734444, dupfixed 0, sub-dupfixed 0)
     Header: subtotal 2102777, branch 25997, leaf 2076780
     HeaderNumber: subtotal 313172, branch 3396, leaf 309776
     HeadersTotalDifficulty: subtotal 204861, branch 2533, leaf 202328
     IncarnationMap: subtotal 302319, branch 3206, leaf 299113
     LastBlock: subtotal 1, leaf 1
     LastHeader: subtotal 1, leaf 1
     LogAddressIndex: subtotal 401284, branch 5926, leaf 395358
     LogTopicIndex: subtotal 5868834, branch 122453, leaf 5746381
     Migration: subtotal 1, leaf 1
     PlainCodeHash: subtotal 1082894, branch 14859, leaf 1068035
     PlainState: subtotal 13360987, branch 303528, leaf 13057459 (usual 13057459, sub-dupsort 4734331, dupfixed 0, sub-dupfixed 0)
     Receipt: subtotal 3815041, branch 11794, large 1133260, leaf 2641277
     Sequence: subtotal 1, leaf 1
     StorageChangeSet: subtotal 45495878, branch 2408365, leaf 43087513 (usual 43087513, sub-dupsort 420045930, dupfixed 0, sub-dupfixed 0)
     StorageHistory: subtotal 29219292, branch 914007, leaf 28305285
     SyncStage: subtotal 1, leaf 1
     TransactionLog: subtotal 79270236, branch 317729, large 12565224, leaf 58143715
     TrieAccount: subtotal 484241, branch 3622, leaf 480619
     TrieStorage: subtotal 2245964, branch 37398, leaf 2208566
     TxSender: subtotal 9187171, branch 15877, large 6117844, leaf 1269986
 - usage: total 1822088138752 bytes, payload 1409825516677 (77.4%), unused 412262622075 (22.6%)
     @GC: subtotal 12738560 bytes (0.0%), payload 12724600 (99.9%), unused 13960 (0.1%)
     @MAIN: subtotal 4096 bytes (0.0%), payload 3098 (75.6%), unused 998 (24.4%)
     @META: subtotal 12288 bytes (0.0%), payload 684 (5.6%), unused 11604 (94.4%)
     AccountChangeSet: subtotal 134410022912 bytes (7.4%), payload 84926286650 (63.2%), unused 49483736262 (36.8%)
     AccountHistory: subtotal 31505285120 bytes (1.7%), payload 19122057931 (60.7%), unused 12383227189 (39.3%)
     BlockBody: subtotal 1505861632 bytes (0.1%), payload 1432839115 (95.2%), unused 73022517 (4.8%)
     BlockTransaction: subtotal 505138159616 bytes (27.7%), payload 460115591481 (91.1%), unused 45022568135 (8.9%)
     BlockTransactionLookup: subtotal 110980550656 bytes (6.1%), payload 60080626068 (54.1%), unused 50899924588 (45.9%)
     CallFromIndex: subtotal 24941813760 bytes (1.4%), payload 14448508581 (57.9%), unused 10493305179 (42.1%)
     CallToIndex: subtotal 28545036288 bytes (1.6%), payload 17495928321 (61.3%), unused 11049107967 (38.7%)
     CallTraceSet: subtotal 129199222784 bytes (7.1%), payload 79434968497 (61.5%), unused 49764254287 (38.5%)
     CanonicalHeader: subtotal 1337032704 bytes (0.1%), payload 677927930 (50.7%), unused 659104774 (49.3%)
     Code: subtotal 3358212096 bytes (0.2%), payload 2624322319 (78.1%), unused 733889777 (21.9%)
     Config: subtotal 4096 bytes (0.0%), payload 505 (12.3%), unused 3591 (87.7%)
     DbInfo: subtotal 4096 bytes (0.0%), payload 399 (9.7%), unused 3697 (90.3%)
     HashedAccount: subtotal 11428077568 bytes (0.6%), payload 7974810601 (69.8%), unused 3453266967 (30.2%)
     HashedCodeHash: subtotal 2868600832 bytes (0.2%), payload 1961311192 (68.4%), unused 907289640 (31.6%)
     HashedStorage: subtotal 44875599872 bytes (2.5%), payload 29886840161 (66.6%), unused 14988759711 (33.4%)
     Header: subtotal 8612974592 bytes (0.5%), payload 7903814692 (91.8%), unused 709159900 (8.2%)
     HeaderNumber: subtotal 1282752512 bytes (0.1%), payload 684896000 (53.4%), unused 597856512 (46.6%)
     HeadersTotalDifficulty: subtotal 839110656 bytes (0.0%), payload 818922881 (97.6%), unused 20187775 (2.4%)
     IncarnationMap: subtotal 1238298624 bytes (0.1%), payload 855845538 (69.1%), unused 382453086 (30.9%)
     LastBlock: subtotal 4096 bytes (0.0%), payload 71 (1.7%), unused 4025 (98.3%)
     LastHeader: subtotal 4096 bytes (0.0%), payload 72 (1.8%), unused 4024 (98.2%)
     LogAddressIndex: subtotal 1643659264 bytes (0.1%), payload 1250983781 (76.1%), unused 392675483 (23.9%)
     LogTopicIndex: subtotal 24038744064 bytes (1.3%), payload 13693007104 (57.0%), unused 10345736960 (43.0%)
     Migration: subtotal 4096 bytes (0.0%), payload 478 (11.7%), unused 3618 (88.3%)
     PlainCodeHash: subtotal 4435533824 bytes (0.2%), payload 3034093502 (68.4%), unused 1401440322 (31.6%)
     PlainState: subtotal 54726602752 bytes (3.0%), payload 35837793553 (65.5%), unused 18888809199 (34.5%)
     Receipt: subtotal 15626407936 bytes (0.9%), payload 12040512860 (77.1%), unused 3585895076 (22.9%)
     Sequence: subtotal 4096 bytes (0.0%), payload 54 (1.3%), unused 4042 (98.7%)
     StorageChangeSet: subtotal 186351116288 bytes (10.2%), payload 155081891083 (83.2%), unused 31269225205 (16.8%)
     StorageHistory: subtotal 119682220032 bytes (6.6%), payload 75745079650 (63.3%), unused 43937140382 (36.7%)
     SyncStage: subtotal 4096 bytes (0.0%), payload 524 (12.8%), unused 3572 (87.2%)
     TransactionLog: subtotal 324690886656 bytes (17.8%), payload 285547825769 (87.9%), unused 39143060887 (12.1%)
     TrieAccount: subtotal 1983451136 bytes (0.1%), payload 1820076622 (91.8%), unused 163374514 (8.2%)
     TrieStorage: subtotal 9199468544 bytes (0.5%), payload 8519581404 (92.6%), unused 679887140 (7.4%)
     TxSender: subtotal 37630652416 bytes (2.1%), payload 26796442906 (71.2%), unused 10834209510 (28.8%)
 - summary: average fill 77.4%, 0 problems
Processing '@MAIN'...
 - key-value kind: usual-key => single-value, flags: none (0x00), dbi-id 1
 - page size 4096, entries 44
 - b-tree depth 1, pages: branch 0, leaf 1, overflow 0
 - summary: 44 records, 0 dups, 526 key's bytes, 2112 data's bytes, 0 problems
Processing '@GC'...
 - key-value kind: ordinal-key => single-value, flags: integerkey (0x08), dbi-id 0
 - last modification txn#1353
 - page size 4096, entries 324
 - b-tree depth 2, pages: branch 1, leaf 2, overflow 3107
 - fixed key-size 8
 - summary: 324 records, 0 dups, 2592 key's bytes, 12710904 data's bytes, 0 problems
 - space: 536870912 total pages, backed 448266240 (83.5%), allocated 448023139 (83.5%), remained 88847773 (16.5%), used 444845737 (82.9%), gc 3177402 (0.6%), detained 21605 (0.0%), reclaimable 3155797 (0.6%), available 92003570 (17.1%)
Processing 'AccountChangeSet'...
 - key-value kind: usual-key => multi-value, flags: dupsort (0x04), dbi-id 2
 - last modification txn#1350
 - page size 4096, entries 2084096603
 - b-tree depth 4, pages: branch 7479361, leaf 25335586, overflow 0
 - summary: 2084096603 records, 2070785841 dups, 16672772824 key's bytes, 61997120882 data's bytes, 0 problems
Processing 'AccountHistory'...
 - key-value kind: usual-key => single-value, flags: none (0x00), dbi-id 3
 - last modification txn#1320
 - page size 4096, entries 193746269
 - b-tree depth 5, pages: branch 122481, leaf 7569239, overflow 0
 - summary: 193746269 records, 0 dups, 5424895532 key's bytes, 11317009455 data's bytes, 0 problems
Processing 'BittorrentInfo'...
 - key-value kind: usual-key => single-value, flags: none (0x00), dbi-id 4
 - last modification txn#4
 - page size 4096, entries 0
 - b-tree depth 0, pages: branch 0, leaf 0, overflow 0
 - summary: 0 records, 0 dups, 0 key's bytes, 0 data's bytes, 0 problems
Processing 'BlockBody'...
 - key-value kind: usual-key => single-value, flags: none (0x00), dbi-id 5
 - last modification txn#1348
 - page size 4096, entries 13310787
 - b-tree depth 4, pages: branch 4540, leaf 363102, overflow 0
 - summary: 13310787 records, 0 dups, 532431480 key's bytes, 741746475 data's bytes, 0 problems
Processing 'BlockTransaction'...
 - key-value kind: usual-key => single-value, flags: none (0x00), dbi-id 6
 - last modification txn#1348
 - page size 4096, entries 1858141678
 - b-tree depth 5, pages: branch 441213, leaf 98831039, overflow 24052494
 - interrupted by signal
 - summary: 954650738 records, 0 dups, 7637205904 key's bytes, 186066347913 data's bytes, 0 problems
No error is detected, elapsed 87272.902 seconds
erthink commented 3 years ago

No errors were found when the database was checked again. The probable cause could be RAM glitches or other system failures. So close it as not confirmed.