eBay / HomeStore

Storage Engine for block and key/value stores.
Apache License 2.0
21 stars 17 forks source link

Panic at commit_ext due to rreq has been GCed by reaper thread. #506

Closed xiaoxichen closed 4 weeks ago

xiaoxichen commented 1 month ago

This issue is discovered in the SM incremental resync testing. In the test, one of the follower get restarted while the write still on-going. Assuming, replica 1, get stopped on LSN X, and started on LSN Y.

The restarted follower, went through re-sync process to catch up those missing LSNs from [X+1, Y]. In the meanwhile, new writes, with higher LSN >Y , were pushed to this replica through data channel and persisted. Note that, the m_start_time of the rreqs, was set to the time that data channel received.

The re-sync unfortunately take very long time (20mins, which will be tracked separately). After that it receive LSN Y and linked with existing data, link to LSN into state-machine. Now the reaper threads comes into the picture, it iterate through all repl_reqs and GC those expired one. This request, was identified as expired as it was created 20mins ago, and get unlinked from lsn->req map. Later when we do commit, we cannot find repl_req from lsn and assert.

Full log can be seen in the comment.

xiaoxichen commented 1 month ago

When we receive data of dsn 138491 from data channel. 08/15/24 20:11:51

storage_mgr_log.2:[08/15/24 20:11:51.266] [storage_mgr] [debug] [76] [raft_repl_dev.cpp:390] [rd=rdev1] in follower_create_req: rreq=dsn=138491 term=1 lsn=-1 op=HS_DATA_LINKED local_blkid=[{blk#=4961 count=5 chunk=11997},] state=[BLK_ALLOCATED | ], addr=140038928672528
storage_mgr_log.2:[08/15/24 20:11:51.276] [storage_mgr] [debug] [13] [raft_repl_dev.cpp:330] [rd=rdev1] Data Channel: Data write completed for rreq=[dsn=138491 term=1 lsn=-1 op=HS_DATA_LINKED local_blkid=[{blk#=4961 count=5 chunk=11997},] state=[BLK_ALLOCATED | DATA_RECEIVED | DATA_WRITTEN | ]], time_diff_data_log_us=24, data_write_latency_us=10292, total_data_write_latency_us(rreq creation to write complete)=10268, local_blkid.num_pieces=1

Incremental resync still take place at that time

[08/15/24 20:11:49.399] [storage_mgr] [debug] [13] [raft_repl_dev.cpp:742] [rd=rdev1] Data Channel: Data fetched from remote: rreq=[dsn=94936 term=1 lsn=-1 op=HS_DATA_LINKED local_blkid=[{blk#=4901 count=5 chunk=20099},] state=[BLK_ALLOCATED | DATA_RECEIVED | ]], data_size: 5120, total_size: 327680, local_blkid: [{blk#=4901 count=5 chunk=20099},]
[08/15/24 20:11:49.399] [storage_mgr] [debug] [13] [raft_repl_dev.cpp:742] [rd=rdev1] Data Channel: Data fetched from remote: rreq=[dsn=94937 term=1 lsn=-1 op=HS_DATA_LINKED local_blkid=[{blk#=4926 count=5 chunk=10350},] state=[BLK_ALLOCATED | DATA_RECEIVED | ]], data_size: 5120, total_size: 322560, local_blkid: [{blk#=4926 count=5 chunk=10350},]

[08/15/24 20:33:31.398] [storage_mgr] [debug] [13] [raft_repl_dev.cpp:742] [rd=rdev1] Data Channel: Data fetched from remote: rreq=[dsn=134891 term=1 lsn=-1 op=HS_DATA_LINKED local_blkid=[{blk#=18646 count=5 chunk=10357},] state=[BLK_ALLOCATED | DATA_RECEIVED | ]], data_size: 5120, total_size: 10240, local_blkid: [{blk#=18646 count=5 chunk=10357},]
[08/15/24 20:33:31.398] [storage_mgr] [debug] [13] [raft_repl_dev.cpp:742] [rd=rdev1] Data Channel: Data fetched from remote: rreq=[dsn=134894 term=1 lsn=-1 op=HS_DATA_LINKED local_blkid=[{blk#=18646 count=5 chunk=20071},] state=[BLK_ALLOCATED | DATA_RECEIVED | ]], data_size: 5120, total_size: 5120, local_blkid: [{blk#=18646 count=5 chunk=20071},

receiving log of dsn 138491, 08/15/24 20:33:32

root@sm-long-running1:/mnt/sm_home/logs/latest# grep "dsn=138491" *
storage_mgr_log.1:[08/15/24 20:33:32.189] [storage_mgr] [debug] [74] [raft_repl_dev.cpp:354] [rd=rdev1] Repl_key=[server=1559880499, term=1, dsn=138491] already received
storage_mgr_log.1:[08/15/24 20:33:32.191] [storage_mgr] [debug] [74] [repl_log_store.cpp:22] [rd=rdev1] Raft Channel: Received append log entry rreq=[dsn=138491 term=1 lsn=138496 op=HS_DATA_LINKED local_blkid=[{blk#=4961 count=5 chunk=11997},] state=[BLK_ALLOCATED | DATA_RECEIVED | DATA_WRITTEN | LOG_RECEIVED | ]]
storage_mgr_log.1:[08/15/24 20:33:32.191] [storage_mgr] [debug] [74] [raft_state_machine.cpp:181] [rd=rdev1] Raft channel: Precommit rreq=[dsn=138491 term=1 lsn=138496 op=HS_DATA_LINKED local_blkid=[{blk#=4961 count=5 chunk=11997},] state=[BLK_ALLOCATED | DATA_RECEIVED | DATA_WRITTEN | LOG_RECEIVED | ]]
storage_mgr_log.1:[08/15/24 20:33:32.193] [storage_mgr] [debug] [74] [raft_repl_dev.cpp:405] [rd=rdev1] Raft Channel: Data write completed and blkid mapped: rreq=[dsn=138491 term=1 lsn=138496 op=HS_DATA_LINKED local_blkid=[{blk#=4961 count=5 chunk=11997},] state=[BLK_ALLOCATED | DATA_RECEIVED | DATA_WRITTEN | LOG_RECEIVED | ]]

req is expired and get GCed

storage_mgr_log.1:[08/15/24 20:33:32.832] [storage_mgr] [debug] [82] [raft_repl_dev.cpp:1102] [rd=rdev1] rreq=[repl_key=[server=1559880499, term=1, dsn=138491], lsn=138496 state=[BLK_ALLOCATED | DATA_RECEIVED | DATA_WRITTEN | LOG_RECEIVED | LOG_FLUSHED] m_headersize=40 m_keysize=8 is_proposer=false local_blkid=[{blk#=4961 count=5 chunk=11997},] remote_blkid=[{blk#=6911 count=5 chunk=11998},]] is expired, cleaning up; elapsed_time=1301

we panic with

[08/15/24 20:33:33.225] [storage_mgr_critical] [critical] [80] [rd=rdev1] Raft channel got null rreq
[08/15/24 20:33:33.613] [storage_mgr_critical] [critical] [80] [stacktrace.cpp:122:crash_handler]
 * ****Received fatal SIGNAL : SIGABRT(6)   PID : 8
xiaoxichen commented 1 month ago

another victim

[08/15/24 20:17:48.270] [storage_mgr_critical] [critical] [66] commiting blkid blk#=14981 count=5 chunk=20005 is not allocated in non-recovery mode
[08/15/24 20:17:48.669] [storage_mgr_critical] [critical] [66] [stacktrace.cpp:122:crash_handler]
 * ****Received fatal SIGNAL : SIGABRT(6)   PID : 8
[08/15/24 20:17:48.669] [storage_mgr_critical] [critical] [66] Thread num: 140224482878976 entered exit handler

which I believe due to similar reason, that the blk was freed by GC thread while committing. Basically the state-machine is racing with reaper thread.