We found 22 crash-consistency bugs introduced in WITCHER's PMDK adaptation layer added during the process of porting data structures to us a PM allocator. These bugs are all related to the initialization phase of the data structures (i.e., during PMDK pool creation and allocation of the root/parent structures of the key-value data structures).
Since these bugs are specific to WITCHER's port, I'm filing these bugs here rather than with the maintainers of the ported applications/data structures.
Many of the benchmarks that use init_nvmm_pool do not check for the crash when a crash occurs between pool allocation and the initial allocation of the object. Ergo, they will immediately crash upon their first read of the data structure, as the root pointer is not allocated.
This will ultimately result in a segfault on recovery.
Specific Bugs by Location/Benchmark
Here's a list of all the bugs by their source code locations. All of these bugs fall into the previously-outlined groups, so the per-bug explanations are left fairly terse.
Summary
We found 22 crash-consistency bugs introduced in WITCHER's PMDK adaptation layer added during the process of porting data structures to us a PM allocator. These bugs are all related to the initialization phase of the data structures (i.e., during PMDK pool creation and allocation of the root/parent structures of the key-value data structures).
Since these bugs are specific to WITCHER's port, I'm filing these bugs here rather than with the maintainers of the ported applications/data structures.
NULL
dereferencesMost of these bugs are related to the misuse of the PMDK API. In the common function
init_nvmm_pool
: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/pmdk.c#L5-L30If a crash occurs during
pmemobj_create
: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/pmdk.c#L15Then calls to
pmemobj_open
in the post-crash execution of the application will returnNULL
. This case is handled withininit_nvmm_pool
by returningNULL
: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/pmdk.c#L21-L24However, all of the benchmarks that use
init_nvmm_pool
do not check for the case wheninit_nvmm_pool
returnsNULL
. Ergo, these benchmarks will immediately segfault when trying to . For example, in WOART: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L785-L789If
root_obj
isNULL
, a segfault will occur on access toroot_obj->woat_ptr
: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L788Non-atomic initialization/lack of
NULL
checksMany of the benchmarks that use
init_nvmm_pool
do not check for the crash when a crash occurs between pool allocation and the initial allocation of the object. Ergo, they will immediately crash upon their first read of the data structure, as the root pointer is not allocated.For example, during WOART initialization: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L780-L797
If there is a crash after
init_nvmm_pool
androot_obj->woart_ptr = tree;
, thenroot_obj
will not beNULL
butroot_obj->woart_ptr
will still beNULL
, andinit_woart
will returnNULL
on restart: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L788This will cause
NULL
to be passed to theart_{search,insert}
functions, i.e.: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/WORT/woart/main/main.c#L34Since the
art_*
functions assume they will not be passed aNULL
pointer: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L293-L295This will ultimately result in a segfault on recovery.
Specific Bugs by Location/Benchmark
Here's a list of all the bugs by their source code locations. All of these bugs fall into the previously-outlined groups, so the per-bug explanations are left fairly terse.
WOART
Bug 1:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L785-L788
NULL
check oninit_nvmm_pool
return value.Bug 2:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L794
art_tree
can result ininit_woart
returningNULL
, which is not handled and leads to segfaults duringart_*
API calls: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L788WORT
Bug 3:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/wort/wort.c#L103-L107
NULL
check oninit_nvmm_pool
return value.Bug 4:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/wort/wort.c#L113
art_tree
can result ininit_wort
returningNULL
.FAST & FAIR
These bugs are both in the
single
port and theconcurrent
port.Bug 5:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/single/src/btree.h#L1039-L1043
NULL
check oninit_nvmm_pool
return value.Bug 6:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/single/src/btree.h#L1044-L1046
init_FastFair
can returnNULL
, leading to subsequent segfault.Bug 7:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/concurrent/src/btree.h#L1174-L1180
NULL
check oninit_nvmm_pool
return value.Bug 8:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/concurrent/src/btree.h#L1182-L1183
init_FastFair
can returnNULL
, leading to subsequent segfault.Level Hashing
Bug 9:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/Level_Hashing/persistent_level_hashing/level_hashing.c#L66-L69
NULL
check oninit_nvmm_pool
return value.Bug 10:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/Level_Hashing/persistent_level_hashing/level_hashing.c#L69 https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/Level_Hashing/persistent_level_hashing/level_hashing.c#L121
level_init
can returnNULL
, leading to subsequent segfault.CCEH
Bug 11:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/CCEH/src/CCEH_MSB.cpp#L525-L528
NULL
check oninit_nvmm_pool
return value.Bug 12:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/CCEH/src/CCEH_MSB.cpp#L528-L529
NULL
check onroot_obj->cceh_ptr
, leading to segfault onret->Recovery()
.P-ART
Bug 13:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-ART/Tree.cpp#L70-L73
NULL
check oninit_nvmm_pool
return value.Bug 14:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-ART/Tree.cpp#L72-L74
NULL
check onroot_obj->p_art_ptr
, leading to segfault ontree->recover()
.P-BwTree
Bug 15:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-BwTree/main/main.cpp#L71-L74
NULL
check oninit_nvmm_pool
return value.Bug 16:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-BwTree/main/main.cpp#L74-L75
NULL
check onroot_obj->p_bwtree_ptr
, leading to segfault ontree->re_init()
.P-CLHT
Bug 17:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-CLHT/src/clht_lb_res.c#L1060-L1063
NULL
check oninit_nvmm_pool
return value.Bug 18:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-CLHT/src/clht_lb_res.c#L1061-L1065
NULL
check onroot_obj->clht_ptr
, leading to segfault onclht_lock_initialization(hashtable)
.P-HOT
Bug 19:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-HOT/main/main.cpp#L133-L136
NULL
check oninit_nvmm_pool
return value.Bug 20:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-HOT/main/main.cpp#L134-L138
NULL
check onroot_obj->p_hot_ptr
, leading to segfault ontree->recovery()
.P-Masstree
Bug 21:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-Masstree/masstree.h#L2358-L2361
NULL
check oninit_nvmm_pool
return value.Bug 22:
https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-Masstree/masstree.h#L2359-L2364
NULL
check onroot_obj->p_mt_ptr
, leading to segfault ontree->recovery()
.