caetanosauer / zero

Fork of the Shore-MT storage manager used by the research project Instant Recovery
Other
29 stars 10 forks source link

Eviction assertion failure #20

Closed llersch closed 9 years ago

llersch commented 9 years ago

When running:

src/zapps kits -b tpcc --load

The following happens:

Starting program: /home/lucas/workspace/zapps/build/src/zapps kits -b tpcc --load
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffaf65f5700 (LWP 20986)]
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:303:set_pd: DB set to (1)
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:260:set_sf: New Scaling factor: 1.0
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:243:set_qf: New Queried Factor: 1.0
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:1021:print_cpus: MaxCPU=(0) - ActiveCPU=(0)
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:777:configure_sm: Configuring Shore...
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:826:start_sm: Starting Shore...
[New Thread 0x7ffaf55aa700 (LWP 20987)]
[New Thread 0x7ff85f4d4700 (LWP 20988)]
[New Thread 0x7ff85e916700 (LWP 20989)]
[New Thread 0x7ff85d814700 (LWP 20994)]
[New Thread 0x7ff857fff700 (LWP 20995)]
[New Thread 0x7ff850bfc700 (LWP 20996)]
[New Thread 0x7ff843fff700 (LWP 20997)]
[New Thread 0x7ff8437fe700 (LWP 20998)]
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:848:start_sm: Formatting a new device (db) with a (12288000) kB quota
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:855:start_sm: Formatting device completed...
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:859:start_sm: Mounting (new) device completed...
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:893:start_sm: Is fake I/O delay enabled: (0)
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:901:start_sm: I/O delay latency set: (0)
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:541:start: ShoreEnv initialized
140715851994880: /home/lucas/workspace/zapps/src/kits/shore_env.cpp:547:start: Starting ()
140715851994880: /home/lucas/workspace/zapps/src/kits/tpcc/tpcc_env.cpp:325:info: SF      = (1.0)
140715851994880: /home/lucas/workspace/zapps/src/kits/tpcc/tpcc_env.cpp:326:info: Workers = (4)
[New Thread 0x7ff842ffd700 (LWP 21001)]
[New Thread 0x7ff8427fc700 (LWP 21002)]
[New Thread 0x7ff841ffb700 (LWP 21003)]
[New Thread 0x7ff8417fa700 (LWP 21004)]
[New Thread 0x7ff840ff9700 (LWP 21005)]
CR: /home/lucas/workspace/zapps/src/kits/table_desc.cpp:125:create_physical_index: WAREHOUSE 2 (latch) (no relaxed) (unique)
CR: /home/lucas/workspace/zapps/src/kits/table_desc.cpp:125:create_physical_index: DISTRICT 3 (latch) (no relaxed) (unique)
CR: /home/lucas/workspace/zapps/src/kits/table_desc.cpp:125:create_physical_index: CUSTOMER 4 (latch) (no relaxed) (unique)
CR: /home/lucas/workspace/zapps/src/kits/table_desc.cpp:125:create_physical_index: C_NAME_IDX 5 (latch) (no relaxed) (no unique)
CR: /home/lucas/workspace/zapps/src/kits/table_desc.cpp:125:create_physical_index: HISTORY 6 (latch) (no relaxed) (unique)
CR: /home/lucas/workspace/zapps/src/kits/table_desc.cpp:125:create_physical_index: NEW_ORDER 7 (latch) (no relaxed) (unique)
CR: /home/lucas/workspace/zapps/src/kits/table_desc.cpp:125:create_physical_index: ORDER 8 (latch) (no relaxed) (unique)
CR: /home/lucas/workspace/zapps/src/kits/table_desc.cpp:125:create_physical_index: O_CUST_IDX 9 (latch) (no relaxed) (unique)
CR: /home/lucas/workspace/zapps/src/kits/table_desc.cpp:125:create_physical_index: ORDERLINE 10 (latch) (no relaxed) (unique)
CR: /home/lucas/workspace/zapps/src/kits/table_desc.cpp:125:create_physical_index: ITEM 11 (latch) (no relaxed) (unique)
CR: /home/lucas/workspace/zapps/src/kits/table_desc.cpp:125:create_physical_index: STOCK 12 (latch) (no relaxed) (unique)
[Thread 0x7ff840ff9700 (LWP 21005) exited]
[New Thread 0x7ff840ff9700 (LWP 21006)]
[New Thread 0x7ff81ffff700 (LWP 21007)]
[New Thread 0x7ff81f7fe700 (LWP 21008)]
[New Thread 0x7ff81effd700 (LWP 21009)]
0
assertion failure: -2<offset && offset<nrecs()
1. error in /home/lucas/workspace/zero/src/sm/btree_page_h.h:1254 Assertion failed
    called from:
    0) /home/lucas/workspace/zero/src/sm/btree_page_h.h:1254

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ff81ffff700 (LWP 21007)]
0x00007ffff67120d5 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff67120d5 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff671583b in __GI_abort () at abort.c:91
#2  0x00000000007272ef in w_base_t::abort() () at /home/lucas/workspace/zero/src/fc/w_base.cpp:222
#3  0x00000000007270be in w_base_t::assert_failed(char const*, char const*, unsigned int) () at /home/lucas/workspace/zero/src/fc/w_base.cpp:127
#4  0x00000000005de160 in btree_page_h::page_pointer_address(int) () at /home/lucas/workspace/zero/src/sm/btree_page_h.h:1254
#5  0x00000000005dbd94 in fixable_page_h::child_slot_address(int) const () at /home/lucas/workspace/zero/src/sm/fixable_page_h.cpp:400
#6  0x000000000058007f in bf_tree_m::_lookup_buf_imprecise(btree_page_h&, unsigned int, unsigned int&, bool&) const ()
    at /home/lucas/workspace/zero/src/sm/bf_tree.cpp:1684
#7  0x0000000000598667 in bf_tree_m::_evict_traverse_page(EvictionContext&) () at /home/lucas/workspace/zero/src/sm/bf_tree_evict.cpp:601
#8  0x0000000000598334 in bf_tree_m::_evict_traverse_store(EvictionContext&) () at /home/lucas/workspace/zero/src/sm/bf_tree_evict.cpp:565
#9  0x0000000000597f69 in bf_tree_m::_evict_traverse_volume(EvictionContext&) () at /home/lucas/workspace/zero/src/sm/bf_tree_evict.cpp:516
#10 0x0000000000597be2 in bf_tree_m::_evict_blocks(EvictionContext&) () at /home/lucas/workspace/zero/src/sm/bf_tree_evict.cpp:463
#11 0x0000000000597994 in bf_tree_m::evict_blocks(unsigned int&, unsigned int&, evict_urgency_t, unsigned int) ()
    at /home/lucas/workspace/zero/src/sm/bf_tree_evict.cpp:430
#12 0x0000000000595935 in bf_tree_m::_get_replacement_block() () at /home/lucas/workspace/zero/src/sm/bf_tree_evict.cpp:222
#13 0x000000000059581b in bf_tree_m::_grab_free_block(unsigned int&, bool) () at /home/lucas/workspace/zero/src/sm/bf_tree_evict.cpp:192
#14 0x000000000057a94e in bf_tree_m::_fix_nonswizzled(generic_page*, generic_page*&, unsigned short, unsigned int, latch_mode_t, bool, bool, bool)
    () at /home/lucas/workspace/zero/src/sm/bf_tree.cpp:760
#15 0x00000000005dc1ff in bf_tree_m::fix_nonroot(generic_page*&, generic_page*, unsigned short, unsigned int, latch_mode_t, bool, bool, bool) ()
    at /home/lucas/workspace/zero/src/sm/bf_tree_inline.h:126
#16 0x00000000005daa78 in fixable_page_h::fix_nonroot(fixable_page_h const&, unsigned short, unsigned int, latch_mode_t, bool, bool, bool) ()
    at /home/lucas/workspace/zero/src/sm/fixable_page_h.cpp:58
#17 0x000000000059ff93 in btree_impl::_ux_traverse_recurse(btree_page_h&, w_keystr_t const&, btree_impl::traverse_mode_t, latch_mode_t, btree_page_h&, unsigned int&, bool) () at /home/lucas/workspace/zero/src/sm/btree_impl_search.cpp:244
#18 0x000000000059f76a in btree_impl::_ux_traverse(stid_t, w_keystr_t const&, btree_impl::traverse_mode_t, latch_mode_t, btree_page_h&, bool, bool) () at /home/lucas/workspace/zero/src/sm/btree_impl_search.cpp:125
#19 0x00000000006afa0e in btree_impl::_ux_insert_core(stid_t, w_keystr_t const&, cvec_t const&) ()
    at /home/lucas/workspace/zero/src/sm/btree_impl.cpp:52
#20 0x00000000006af8b3 in btree_impl::_ux_insert(stid_t, w_keystr_t const&, cvec_t const&) ()
    at /home/lucas/workspace/zero/src/sm/btree_impl.cpp:35
#21 0x00000000006acf0f in btree_m::insert(stid_t, w_keystr_t const&, cvec_t const&) () at /home/lucas/workspace/zero/src/sm/btree.cpp:85
#22 0x0000000000677ba4 in ss_m::create_assoc(stid_t, w_keystr_t const&, vec_t const&) () at /home/lucas/workspace/zero/src/sm/smindex.cpp:62
#23 0x000000000054cd31 in table_man_t<tpcc::customer_t>::add_tuple(ss_m*, table_row_t*, okvl_mode::element_lock_mode, lpid_t const&) ()
#24 0x000000000053ebd8 in tpcc::ShoreTPCCEnv::xct_populate_one_unit(int, tpcc::populate_one_unit_input_t&) ()
#25 0x000000000052eb34 in tpcc::ShoreTPCCEnv::table_builder_t::work() ()
#26 0x0000000000506814 in thread_t::run() ()
#27 0x0000000000708c8e in sthread_t::_start() () at /home/lucas/workspace/zero/src/sthread/sthread.cpp:739
#28 0x00000000007087f0 in sthread_t::__start(void*) () at /home/lucas/workspace/zero/src/sthread/sthread.cpp:665
#29 0x000000000070e01f in pthread_core_start () at /home/lucas/workspace/zero/src/sthread/sthread_core_pthread.cpp:127
#30 0x00007ffff79bce9a in start_thread (arg=0x7ff81ffff700) at pthread_create.c:308
#31 0x00007ffff67cf38d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#32 0x0000000000000000 in ?? ()

Buffer size is the probable cause, trying to run it with --bufsize 200

caetanosauer commented 9 years ago

This happens because eviction is completely broken! It needs a serious redesign.

This particular bug is dues to the "imprecise" hash table lookups and page accesses performed by the eviction algorithm. This means access without any concurrency control. In this case, it can happen that we are trying to traverse a page by following a pointer in the parent, but the page may be split meanwhile, so that we still have the old (now non-existing) slot.

As a workaround, I fixed it by returning a NULL pointer when the slot is invalid. The consequence is that other parts of the code have to check the returning pointer.

caetanosauer commented 9 years ago

Fixed (actually worked around) on c668bdf and 3282c96