cosmoss-jigu / witcher

SOSP'21 Witcher Artifact
9 stars 0 forks source link

22 Crash Recovery Bugs Found in PMDK Ports #1

Open iangneal opened 2 years ago

iangneal commented 2 years ago

Summary

We found 22 crash-consistency bugs introduced in WITCHER's PMDK adaptation layer added during the process of porting data structures to us a PM allocator. These bugs are all related to the initialization phase of the data structures (i.e., during PMDK pool creation and allocation of the root/parent structures of the key-value data structures).

Since these bugs are specific to WITCHER's port, I'm filing these bugs here rather than with the maintainers of the ported applications/data structures.

NULL dereferences

Most of these bugs are related to the misuse of the PMDK API. In the common function init_nvmm_pool: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/pmdk.c#L5-L30

If a crash occurs during pmemobj_create: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/pmdk.c#L15

Then calls to pmemobj_open in the post-crash execution of the application will return NULL. This case is handled within init_nvmm_pool by returning NULL: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/pmdk.c#L21-L24

However, all of the benchmarks that use init_nvmm_pool do not check for the case when init_nvmm_pool returns NULL. Ergo, these benchmarks will immediately segfault when trying to . For example, in WOART: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L785-L789

If root_obj is NULL, a segfault will occur on access to root_obj->woat_ptr: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L788

Non-atomic initialization/lack of NULL checks

Many of the benchmarks that use init_nvmm_pool do not check for the crash when a crash occurs between pool allocation and the initial allocation of the object. Ergo, they will immediately crash upon their first read of the data structure, as the root pointer is not allocated.

For example, during WOART initialization: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L780-L797

If there is a crash after init_nvmm_pool and root_obj->woart_ptr = tree;, then root_obj will not be NULL but root_obj->woart_ptr will still be NULL, and init_woart will return NULL on restart: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L788

This will cause NULL to be passed to the art_{search,insert} functions, i.e.: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/WORT/woart/main/main.c#L34

Since the art_* functions assume they will not be passed a NULL pointer: https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L293-L295

This will ultimately result in a segfault on recovery.

Specific Bugs by Location/Benchmark

Here's a list of all the bugs by their source code locations. All of these bugs fall into the previously-outlined groups, so the per-bug explanations are left fairly terse.

WOART

Bug 1:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L785-L788

Bug 2:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/woart/woart.c#L794

WORT

Bug 3:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/wort/wort.c#L103-L107

Bug 4:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/WORT/src/wort/wort.c#L113

FAST & FAIR

These bugs are both in the single port and the concurrent port.

Bug 5:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/single/src/btree.h#L1039-L1043

Bug 6:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/single/src/btree.h#L1044-L1046

Bug 7:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/concurrent/src/btree.h#L1174-L1180

Bug 8:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/FAST_FAIR/concurrent/src/btree.h#L1182-L1183

Level Hashing

Bug 9:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/Level_Hashing/persistent_level_hashing/level_hashing.c#L66-L69

Bug 10:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/Level_Hashing/persistent_level_hashing/level_hashing.c#L69 https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/Level_Hashing/persistent_level_hashing/level_hashing.c#L121

CCEH

Bug 11:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/CCEH/src/CCEH_MSB.cpp#L525-L528

Bug 12:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/CCEH/src/CCEH_MSB.cpp#L528-L529

P-ART

Bug 13:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-ART/Tree.cpp#L70-L73

Bug 14:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-ART/Tree.cpp#L72-L74

P-BwTree

Bug 15:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-BwTree/main/main.cpp#L71-L74

Bug 16:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-BwTree/main/main.cpp#L74-L75

P-CLHT

Bug 17:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-CLHT/src/clht_lb_res.c#L1060-L1063

Bug 18:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-CLHT/src/clht_lb_res.c#L1061-L1065

P-HOT

Bug 19:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-HOT/main/main.cpp#L133-L136

Bug 20:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/benchmark/RECIPE/P-HOT/main/main.cpp#L134-L138

P-Masstree

Bug 21:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-Masstree/masstree.h#L2358-L2361

Bug 22:

https://github.com/cosmoss-vt/witcher/blob/ad69038cdcd4ac20f1bde38ebf7e6d9fd6999b36/third_party/RECIPE/P-Masstree/masstree.h#L2359-L2364

iangneal commented 2 years ago

FYI for anyone who stumbles upon this repo: I confirmed with the authors via email that they are aware of these bugs.