facebook / mysql-5.6

Facebook's branch of the Oracle MySQL database. This includes MyRocks.
http://myrocks.io
Other
2.48k stars 712 forks source link

Implement RocksDB file-level checksums #1280

Closed laurynas-biveinis closed 1 year ago

laurynas-biveinis commented 1 year ago

Add a new boolean system variable rocksdb_file_checksums.

facebook-github-bot commented 1 year ago

@sunshine-Chun has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

sunshine-Chun commented 1 year ago

From Yoshi:

file_checksums test failed at several test cases. [ 8%] rocksdb.file_checksums 'drop_table_with_compact_range' w4 [ fail ] Test ended at 2023-03-09 17:47:59

CURRENT_TEST: rocksdb.file_checksums 10+0 records in 10+0 records out 10 bytes copied, 4.0831e-05 s, 245 kB/s mysqltest: At line 154: assert_grep.inc failed.

laurynas-biveinis commented 1 year ago

file_checksums test failed at several test cases. [ 8%] rocksdb.file_checksums 'drop_table_with_compact_range' w4 [ fail ] Test ended at 2023-03-09 17:47:59

Cannot reproduce - can you provide the contents of checksum-fail.err of a failed run? Anything else?

sunshine-Chun commented 1 year ago

Here is more info from the failing job: The result from queries just before the failure was:

**** slave_master_info on default ****
SELECT * FROM mysql.slave_master_info;
Number_of_lines Master_log_name Master_log_pos  Host    User_name   User_password   Port    Connect_retry   Enabled_ssl Ssl_ca  Ssl_capath  Ssl_cert    Ssl_cipher  Ssl_key Ssl_verify_server_cert  Heartbeat   Bind    Ignored_server_ids  Uuid    Retry_count Ssl_crl Ssl_crlpath Enabled_auto_position   Channel_name    Tls_version Public_key_path Get_public_key  Network_namespace   Master_compression_algorithm    Master_zstd_compression_level   Tls_ciphersuites    Source_connection_auto_failover Gtid_only

**** mysql.gtid_executed on default ****
SELECT * FROM mysql.gtid_executed;
source_uuid interval_start  interval_end
rpl_topology= 
rand_seed: '' _rand_state: ''
extra debug info if any: ''
rpl_topology=
connection default;
include/assert_grep.inc failed!
assert_text: 'RocksDB file checksum error is detected'
assert_file: '/mnt/btrfs/trunk-git-mysql-14-1678411991/_build-8.0-ASanDebug/mysql-test/var/6/checksum-fail.err'
assert_select: '.sst file checksum mismatch, expecting'
assert_match: ''
assert_count: '1'
assert_only_after: ''
number of matching lines: 0
sunshine-Chun commented 1 year ago

I can reproduce it locally.

The error in checksum-fail.err comes from:

2023-03-17T06:26:50.906344Z 0 [ERROR] [MY-011071] [Server] Plugin rocksdb reported: 'RocksDB: Error opening instance, Status Code: 2, Status: Corruption: block checksum mismatch: stored = 2643080400, computed = 798698760, type = 4 in ./.rocksdb/000017.sst offset 86 size 984 in file ./.rocksdb/MANIFEST-000019' The full log in checksum-fail.err is sent via email.

laurynas-biveinis commented 1 year ago

I see, thank you. The instance with a corrupted SST file may fail block-level checksum verification right on DB::open. I will think how to avoid this reliably.

facebook-github-bot commented 1 year ago

@laurynas-biveinis has updated the pull request. You must reimport the pull request before landing.

laurynas-biveinis commented 1 year ago

Ready for review. Now the testcase corrupts an SST on a running server with RocksDB background work paused, that belongs to a custom CF. I left the check as strict as possible for now - that is, the block-level checksum error on DB open would fail the test - hoping that it will not re-occur. Let me know if it does and I'll relax the check to a general failure for server start, as discussed.

facebook-github-bot commented 1 year ago

@sunshine-Chun has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

hermanlee commented 1 year ago

I think we're seeing the following test error:

CURRENT_TEST: rocksdb.file_checksums
10+0 records in
10+0 records out
10 bytes copied, 3.5501e-05 s, 282 kB/s
mysqltest: At line 154: assert_grep.inc failed.
In included file ./include/assert_grep.inc: 155
included from /mnt/btrfs/trunk-git-mysql-37-1679602716/mysql-test/suite/rocksdb/t/file_checksums.test: 53

The result from queries just before the failure was:
**** slave_master_info on default ****
SELECT * FROM mysql.slave_master_info;
Number_of_lines Master_log_name Master_log_pos  Host    User_name   User_password   Port    Connect_retry   Enabled_ssl Ssl_ca  Ssl_capath  Ssl_cert    Ssl_cipher  Ssl_key Ssl_verify_server_cert  Heartbeat   Bind    Ignored_server_ids  Uuid    Retry_count Ssl_crl Ssl_crlpath Enabled_auto_position   Channel_name    Tls_version Public_key_path Get_public_key  Network_namespace   Master_compression_algorithm    Master_zstd_compression_level   Tls_ciphersuites    Source_connection_auto_failover Gtid_only

**** mysql.gtid_executed on default ****
SELECT * FROM mysql.gtid_executed;
source_uuid interval_start  interval_end
rpl_topology= 
rand_seed: '' _rand_state: ''
extra debug info if any: ''
rpl_topology=
connection default;
include/assert_grep.inc failed!
assert_text: 'RocksDB file checksum error is detected'
assert_file: '/mnt/btrfs/trunk-git-mysql-37-1679602716/_build-8.0-ASanDebug/mysql-test/var/7/checksum-fail.err'
assert_select: '.sst file checksum mismatch, expecting'
assert_match: ''
assert_count: '1'
assert_only_after: ''
number of matching lines: 0
safe_process[110528]: Child process: 110530, exit: 1
sunshine-Chun commented 1 year ago

checksum-fail.err:

2023-03-27T17:49:38.887379Z 0 [ERROR] [MY-011071] [Server] Plugin rocksdb reported: 'RocksDB: Error opening instance, Status Code: 2, Status: Corruption: block checksum mismatch: stored = 3835067806, computed = 3669966302, type = 4 in ./.rocksdb/000017.sst offset 86 size 984 in file ./.rocksdb/MANIFEST-000019'

full log will be sent via email.

facebook-github-bot commented 1 year ago

@laurynas-biveinis has updated the pull request. You must reimport the pull request before landing.

laurynas-biveinis commented 1 year ago

Ready for review again

facebook-github-bot commented 1 year ago

@sunshine-Chun has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.