bloomberg / comdb2

Bloomberg's distributed RDBMS
Other
1.39k stars 214 forks source link

schemalk test triggers segfault #4425

Closed morgando closed 5 months ago

morgando commented 6 months ago

Describe the bug The database segfaults when it accesses bdb_state->attr, which is not set to a valid address.

This is the crash backtrace, for reference:

(gdb) bt
#0  bdb_attr_get (bdb_attr=0xffffffffffffffff, attr=attr@entry=268) at /home/mdouglas47/comdb2/bdb/attr.h:671
#1  0x00005612fa20d0a8 in bdb_process_unused_files (bdb_state=0x7fe068e18c10, tran=0x0, bdberr=0x7fe0644fe314,
    powner=0x5612fa53da3d "schemachange", delay=1) at /home/mdouglas47/comdb2/bdb/file.c:8242
#2  0x00005612fa18f811 in sc_del_unused_files_tran (tran=0x0, db=0x7fe068325938) at /home/mdouglas47/comdb2/schemachange/sc_callbacks.c:521
#3  sc_del_unused_files_tran (db=0x7fe068325938, tran=0x0) at /home/mdouglas47/comdb2/schemachange/sc_callbacks.c:509
#4  0x00005612fa1967dc in scdone_abort_cleanup (iq=iq@entry=0x5612fc013b28) at /home/mdouglas47/comdb2/schemachange/sc_logic.c:1626
#5  0x00005612fa125d00 in osql_scdone_abort_callback (iq=0x5612fc013b28) at /home/mdouglas47/comdb2/db/sqloffload.c:750
#6  osql_postabort_handle (iq=0x5612fc013b28) at /home/mdouglas47/comdb2/db/sqloffload.c:772
#7  0x00005612fa146b9d in toblock_main (p_blkstate=0x0, iq=0x5612fc013b28, javasp_trans_handle=0x0)
    at /home/mdouglas47/comdb2/db/toblock.c:6010
#8  toblock_outer (iq=iq@entry=0x5612fc013b28, blkstate=blkstate@entry=0x7fe0644fea10) at /home/mdouglas47/comdb2/db/toblock.c:2341
#9  0x00005612fa148e9b in toblock (iq=0x5612fc013b28) at /home/mdouglas47/comdb2/db/toblock.c:2086
#10 0x00005612fa0c5606 in handle_op_local (run=0x5612fa148b90 <toblock>, init=<optimized out>, iq=0x5612fc013b28)
    at /home/mdouglas47/comdb2/db/sltdbt.c:167
#11 handle_op_local (iq=0x5612fc013b28, init=<optimized out>, run=0x5612fa148b90 <toblock>) at /home/mdouglas47/comdb2/db/sltdbt.c:132
#12 0x00005612fa0c634f in handle_ireq (iq=0x5612fc013b28) at /home/mdouglas47/comdb2/db/sltdbt.c:402
#13 0x00005612fa06cc75 in thd_req (vthd=0x5612fc0ba798) at /home/mdouglas47/comdb2/db/handle_buf.c:583
#14 0x00007fe075f1e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007fe075e41353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) p *bdb_attr
Cannot access memory at address 0xffffffffffffffff
(gdb)

To Reproduce Run the schemalk test in clustered mode on a loop.

morgando commented 6 months ago

dbtable passed to sc_del_unused_files has a completely uninitialized handle:

(gdb) p *(s->db->handle)
$5 = {pthread_attr_detach = {__size = '\377' <repeats 56 times>, __align = -1}, seqnum_info = 0xffffffffffffffff,
  attr = 0xffffffffffffffff, callback = 0xffffffffffffffff, dbenv = 0xffffffffffffffff, read_write = -1, repinfo = 0xffffffffffffffff,
  numdtafiles = -1 '\377', dbp_data = {{0xffffffffffffffff <repeats 16 times>} <repeats 16 times>}, dbp_ix = {
    0xffffffffffffffff <repeats 50 times>}, tid_key = 4294967295, numthreads = -1, numthreads_lock = {__data = {__lock = -1,
      __count = 4294967295, __owner = -1, __nusers = 4294967295, __kind = -1, __spins = -1, __elision = -1, __list = {
        __prev = 0xffffffffffffffff, __next = 0xffffffffffffffff}}, __size = '\377' <repeats 40 times>, __align = -1},
  name = 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>,
  txndir = 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>,
  tmpdir = 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>,
  dir = 0xffffffffffffffff <error: Cannot access memory at address 0xffffffffffffffff>, lrl = -1, numix = -1, ixlen = {
    -1 <repeats 50 times>}, ixdta = '\377' <repeats 50 times>, ixdtalen = {-1 <repeats 50 times>}, ixcollattr = '\377' <repeats 50 times>,
  ixnulls = '\377' <repeats 50 times>, ixdups = '\377' <repeats 50 times>, ixrecnum = '\377' <repeats 50 times>, keymaxsz = -1,
  checkpoint_thread = 18446744073709551615, watcher_thread = 18446744073709551615, memp_trickle_thread = 18446744073709551615,
  logdelete_thread = 18446744073709551615, lock_detect_thread = 18446744073709551615, coherency_lease_thread = 18446744073709551615,
  master_lease_thread = 18446744073709551615, parent = 0xffffffffffffffff, numchildren = -1, children = {
    0xffffffffffffffff <repeats 3078 times>}, bdb_lock = 0xffffffffffffffff, bdb_lock_desired = -1 '\377', usr_ptr = 0xffffffffffffffff,
  bdb_lock_write_holder = 18446744073709551615, bdb_lock_write_holder_ptr = 0xffffffffffffffff,
  bdb_lock_write_idstr = '\377' <repeats 80 times>, seed = -1, last_genid_epoch = 4294967295, seed_lock = {__data = {__lock = -1,
      __count = 4294967295, __owner = -1, __nusers = 4294967295, __kind = -1, __spins = -1, __elision = -1, __list = {
        __prev = 0xffffffffffffffff, __next = 0xffffffffffffffff}}, __size = '\377' <repeats 40 times>, __align = -1}, bdbtype = -1,
...

Crash occurs when the attr attribute is accessed.

morgando commented 6 months ago

Error prior to crash:

2024/05/09 18:24:36 [ERROR] >>> SCHEMA CHANGE ERROR: TABLE t, RC 12
2024/05/09 18:24:36 sc_set_running(table=t running=0): from bplog_schemachange:1314 rc=0
2024/05/09 18:24:36 >>> DDL SCHEMA CHANGE RC 240 <<<
backout_schema_changes iq 0x5612fc013b28           clone
2024/05/09 18:24:36 [ERROR] change_schemas_recover: invalid table t
2024/05/09 18:24:36 Old file deletion in progress("schemachange")
2024/05/09 18:24:36 sc_del_unused_files_tran: errors listing old files