erthink / libmdbx

One of the fastest embeddable key-value ACID database without WAL. libmdbx surpasses the legendary LMDB in terms of reliability, features and performance.
https://erthink.github.io/libmdbx/
Other
1.16k stars 111 forks source link

mdbx_chk: do Main dbi full-check at start #237

Closed AskAlexSharov closed 3 years ago

AskAlexSharov commented 3 years ago

Example long-running mdbx_chk which I interrupted by Ctrl+C in the middle:

mdbx_chk v0.10.5-0-gedda9515 (2021-10-13T16:35:26+03:00, T-31713895aac05dd55b3ebc8cadd419f96b38de54)
Running for /evo/mainnet/chaindata in 'read-only' mode...
 ~ skipped update meta.geo in recovery mode: from l3-n140401953-u536870912/s1048576-g524288, to l3-n140401953-u536870912/s1-g524288
 - monopolistic mode
 - current boot-id afbd54fddf3ad801-8106c58c535b4926
 - pagesize 4096 (4096 system), max keysize 1980..2022, max readers 118
 - mapsize 2199023255552 (2.00 Tb)
 - dynamic datafile: 12288 (12.00 Kb) .. 2199023255552 (2.00 Tb), +2147483648 (2.00 Gb), -4096 (4.00 Kb)
 - current datafile: 575086399488 (535.59 Gb), 140401953 pages
 - meta-0: steady txn#67409, tail, forced for checking
 - meta-1: steady txn#67411, head
 - meta-2: steady txn#67410, stay
 - skip checking meta-pages since the 0 is selected for verification
 - transactions: recent 67411, selected for verification 67409, lag 2
Skipping b-tree walk...
Processing '@MAIN'...
 - key-value kind: usual-key => single-value
 - last modification txn#67408
 - summary: 44 records, 0 dups, 526 key's bytes, 2112 data's bytes, 0 problems
Processing '@GC'...
 - key-value kind: ordinal-key => single-value
 - last modification txn#67408
 - fixed key-size 8
 - summary: 494 records, 0 dups, 3952 key's bytes, 26776204 data's bytes, 0 problems
 - space: 536870912 total pages, backed 140401953 (26.2%), allocated 140401953 (26.2%), available 403162516 (75.1%)
 - skip check used and gc pages (btree-traversal with monopolistic or read-write mode only)
Processing 'AccountChangeSet'...
 - key-value kind: usual-key => multi-value
 - last modification txn#67408
 - summary: 28621309 records, 28531308 dups, 228970472 key's bytes, 848425652 data's bytes, 0 problems
Processing 'AccountHistory'...
 - key-value kind: usual-key => single-value
 - last modification txn#67408
 - summary: 16000293 records, 0 dups, 448008204 key's bytes, 765509238 data's bytes, 0 problems
Processing 'BittorrentInfo'...
 - key-value kind: usual-key => single-value
 - last modification txn#4
 - summary: 0 records, 0 dups, 0 key's bytes, 0 data's bytes, 0 problems
Processing 'BlockBody'...
 - key-value kind: usual-key => single-value
 - last modification txn#67408
 - summary: 13446756 records, 0 dups, 537870240 key's bytes, 747138407 data's bytes, 0 problems
Processing 'BlockTransaction'...
 - key-value kind: usual-key => single-value
 - last modification txn#67408

^C - interrupted by signal
 - summary: 309979899 records, 0 dups, 2479839192 key's bytes, 52642179159 data's bytes, 0 problems
No error is detected, elapsed 91.456 seconds
mdbx_chk v0.10.5-0-gedda9515 (2021-10-13T16:35:26+03:00, T-31713895aac05dd55b3ebc8cadd419f96b38de54)
Running for /evo/mainnet/chaindata in 'read-only' mode...
 - monopolistic mode
 - current boot-id afbd54fddf3ad801-8106c58c535b4926
 - pagesize 4096 (4096 system), max keysize 1980..2022, max readers 118
 - mapsize 2199023255552 (2.00 Tb)
 - dynamic datafile: 12288 (12.00 Kb) .. 2199023255552 (2.00 Tb), +2147483648 (2.00 Gb), -4294967296 (4.00 Gb)
 - current datafile: 575086399488 (535.59 Gb), 140401953 pages
 - meta-0: steady txn#67409, tail
 - meta-1: steady txn#67411, head, forced for checking
 - meta-2: steady txn#67410, stay
 - skip checking meta-pages since the 1 is selected for verification
 - transactions: recent 67411, selected for verification 67411, lag 0
Skipping b-tree walk...
Processing '@MAIN'...
 - key-value kind: usual-key => single-value
 - last modification txn#67408
 ! corrupted leaf-page #77581853, mod-txnid 67410
 ! invalid page txnid (67410) for parent-page' txnid (67408)
 ! mdbx_cursor_get() failed, error 1 Operation not permitted
 - problems: different number of entries (1)
 - summary: 0 records, 0 dups, 0 key's bytes, 0 data's bytes, 1 problems
 ! abort processing '@GC' due to a previous error
 - space: 536870912 total pages, backed 140401953 (26.2%), allocated 140401953 (26.2%), available 396468959 (73.8%)
Total 4 errors are detected, elapsed 0.007 seconds.

As you can see, error ! corrupted leaf-page #77581853, mod-txnid 67410 was found at the end. Is it possible to move (or add) this check (of Main dbi) to the start of mdbx_chk work?

Main dbi is small - likely it will be cheap. But, such move will allow mdbx_chk "fail fast" on this scenarios. (to find good meta-page faster).

AskAlexSharov commented 3 years ago

Just an idea: maybe this check even can be done when db open (I saw it has some recovery step) - because it's fast. And just automatically choose meta-page with good Main dbi ("good" in terms of ! invalid page txnid (67410) for parent-page' txnid (67408) check).

erthink commented 3 years ago

No.

Automatic switching is allowed only in cases when it is expected by the user and the confidence that it will not cause harm. Thus only a rollback from weak to steady meta-page is allowed automatically. No options.

In all other cases, a complete DB check is required, because otherwise it is impossible to ensure the absence of corruption. I.e. presence of errors during reduced check means that a database broken, but ones absence does not give any guarantees.

AskAlexSharov commented 3 years ago

How about moving this check from end to start of mdbx_chk?

erthink commented 3 years ago

How about moving this check from end to start of mdbx_chk?

No, since this is unreasonable. In general, for a complete check, it is always necessary to first walk a page tree, and only if successful, then walk an entries via cursor.


However, with this database I saw a problem related to #238, i.e. seems that:

erthink commented 3 years ago

At a whole this gave a false-positive diagnosis of database corruption - I'll think about what to do.

@AskAlexSharov, any ideas are welcome.

AskAlexSharov commented 3 years ago

for a complete check - yes, but before complete check and switch meta-page - there is another step "to find which meta-page are for sure bad, and don't run complete check for them", because complete check is long. But i'm fine with "no".