accre / lstore

LStore - A fault-tolerant, performant distributed data storage framework.
http://www.lstore.org
Apache License 2.0
4 stars 5 forks source link

Lock inversion in hc_recv_thread #139

Open PerilousApricot opened 7 years ago

PerilousApricot commented 7 years ago
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=41691)
  Cycle in lock order graph: M2655 (0x7d90012ce0a8) => M123 (0x7d900000a1f8) => M2655

  Mutex M123 acquired here while holding mutex M2655 in thread T52:
    #0 pthread_mutex_lock /home/meloam/llvm/llvm-3.9.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:3600 (globus-gridftp-server+0x000000431ab0)
    #1 hc_recv_thread /home/meloam/lstore/src/gop/hconnection.c:483:21 (libgop.so.0+0x0000000197cc)

  Mutex M2655 previously acquired by the same thread here:
    #0 pthread_mutex_lock /home/meloam/llvm/llvm-3.9.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:3600 (globus-gridftp-server+0x000000431ab0)
    #1 hc_recv_thread /home/meloam/lstore/src/gop/hconnection.c:479:13 (libgop.so.0+0x0000000195c1)

  Mutex M2655 acquired here while holding mutex M123 in thread T280:
    #0 pthread_mutex_lock /home/meloam/llvm/llvm-3.9.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:3600 (globus-gridftp-server+0x000000431ab0)
    #1 _reap_hportal /home/meloam/lstore/src/gop/hportal.c:178:13 (libgop.so.0+0x00000001ce6d)
    #2 compact_hportals /home/meloam/lstore/src/gop/hportal.c:473:9 (libgop.so.0+0x00000001f31a)
    #3 gop_hp_que_op_submit /home/meloam/lstore/src/gop/hportal.c:913:9 (libgop.so.0+0x0000000229ea)
    #4 _ibp_submit_op /home/meloam/lstore/src/ibp/config.c:413:9 (libibp.so.0+0x00000000719f)
    #5 _opque_start_execution /home/meloam/lstore/src/gop/opque.c:443:13 (libgop.so.0+0x0000000264b3)
    #6 _gop_start_execution /home/meloam/lstore/src/gop/gop.c:217:9 (libgop.so.0+0x00000000efd1)
    #7 gop_waitany /home/meloam/lstore/src/gop/gop.c:368:13 (libgop.so.0+0x000000010885)
    #8 seglun_rw_op /home/meloam/lstore/src/lio/segment/lun.c:1618:23 (liblio.so.0+0x0000001878ce)
    #9 seglun_rw_func /home/meloam/lstore/src/lio/segment/lun.c:1781:14 (liblio.so.0+0x00000018a12b)
    #10 thread_pool_exec_fn /home/meloam/lstore/src/gop/thread_pool_op.c:232:14 (libgop.so.0+0x00000002a7b1)
    #11 gop_waitany /home/meloam/lstore/src/gop/gop.c:381:13 (libgop.so.0+0x000000010f19)
    #12 gop_waitany /home/meloam/lstore/src/gop/gop.c:364:13 (libgop.so.0+0x000000010801)
    #13 segjerase_write_func /home/meloam/lstore/src/lio/segment/jerasure.c:1568:27 (liblio.so.0+0x00000014f581)
    #14 thread_pool_exec_fn /home/meloam/lstore/src/gop/thread_pool_op.c:232:14 (libgop.so.0+0x00000002a7b1)
    #15 thread_pool_func /home/meloam/lstore/vendor/apr-util-accre/misc/apr_thread_pool.c:271:13 (libgop.so.0+0x00000005bd0b)

  Mutex M123 previously acquired by the same thread here:
    #0 pthread_mutex_lock /home/meloam/llvm/llvm-3.9.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:3600 (globus-gridftp-server+0x000000431ab0)
    #1 gop_waitany /home/meloam/lstore/src/gop/gop.c:354:5 (libgop.so.0+0x0000000102f8)
    #2 seglun_rw_op /home/meloam/lstore/src/lio/segment/lun.c:1618:23 (liblio.so.0+0x0000001878ce)
    #3 seglun_rw_func /home/meloam/lstore/src/lio/segment/lun.c:1781:14 (liblio.so.0+0x00000018a12b)
    #4 thread_pool_exec_fn /home/meloam/lstore/src/gop/thread_pool_op.c:232:14 (libgop.so.0+0x00000002a7b1)
    #5 gop_waitany /home/meloam/lstore/src/gop/gop.c:381:13 (libgop.so.0+0x000000010f19)
    #6 gop_waitany /home/meloam/lstore/src/gop/gop.c:364:13 (libgop.so.0+0x000000010801)
    #7 segjerase_write_func /home/meloam/lstore/src/lio/segment/jerasure.c:1568:27 (liblio.so.0+0x00000014f581)
    #8 thread_pool_exec_fn /home/meloam/lstore/src/gop/thread_pool_op.c:232:14 (libgop.so.0+0x00000002a7b1)
    #9 thread_pool_func /home/meloam/lstore/vendor/apr-util-accre/misc/apr_thread_pool.c:271:13 (libgop.so.0+0x00000005bd0b)

  Thread T52 (tid=41783, finished) created by thread T16 at:
    #0 pthread_create /home/meloam/llvm/llvm-3.9.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:902 (globus-gridftp-server+0x000000425266)
    #1 create_host_connection /home/meloam/lstore/src/gop/hconnection.c:661:9 (libgop.so.0+0x00000001ba89)
    #2 spawn_new_connection /home/meloam/lstore/src/gop/hportal.c:682:12 (libgop.so.0+0x0000000207e9)
    #3 check_hportal_connections /home/meloam/lstore/src/gop/hportal.c:783:9 (libgop.so.0+0x000000021914)
    #4 gop_hp_submit /home/meloam/lstore/src/gop/hportal.c:892:5 (libgop.so.0+0x00000002282c)
    #5 gop_hp_que_op_submit /home/meloam/lstore/src/gop/hportal.c:935:12 (libgop.so.0+0x000000022d8b)
    #6 _ibp_submit_op /home/meloam/lstore/src/ibp/config.c:413:9 (libibp.so.0+0x00000000719f)
    #7 _opque_start_execution /home/meloam/lstore/src/gop/opque.c:443:13 (libgop.so.0+0x0000000264b3)
    #8 _gop_start_execution /home/meloam/lstore/src/gop/gop.c:217:9 (libgop.so.0+0x00000000efd1)
    #9 gop_waitall /home/meloam/lstore/src/gop/gop.c:427:13 (libgop.so.0+0x000000011ca3)
    #10 slun_row_replace_fix /home/meloam/lstore/src/lio/segment/lun.c:703:15 (liblio.so.0+0x000000178d09)
    #11 _seglun_grow /home/meloam/lstore/src/lio/segment/lun.c:893:21 (liblio.so.0+0x00000017bd0c)
    #12 _slun_truncate /home/meloam/lstore/src/lio/segment/lun.c:1061:15 (liblio.so.0+0x00000017e4d6)
    #13 seglun_rw_func /home/meloam/lstore/src/lio/segment/lun.c:1749:26 (liblio.so.0+0x00000018985d)
    #14 thread_pool_exec_fn /home/meloam/lstore/src/gop/thread_pool_op.c:232:14 (libgop.so.0+0x00000002a7b1)
    #15 gop_waitany /home/meloam/lstore/src/gop/gop.c:381:13 (libgop.so.0+0x000000010f19)
    #16 gop_waitany /home/meloam/lstore/src/gop/gop.c:364:13 (libgop.so.0+0x000000010801)
    #17 segjerase_write_func /home/meloam/lstore/src/lio/segment/jerasure.c:1568:27 (liblio.so.0+0x00000014f581)
    #18 thread_pool_exec_fn /home/meloam/lstore/src/gop/thread_pool_op.c:232:14 (libgop.so.0+0x00000002a7b1)
    #19 thread_pool_func /home/meloam/lstore/vendor/apr-util-accre/misc/apr_thread_pool.c:271:13 (libgop.so.0+0x00000005bd0b)

  Thread T115 (tid=43057, running) created by thread T20 at:
    #0 pthread_create /home/meloam/llvm/llvm-3.9.0.src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:902 (globus-gridftp-server+0x000000425266)
    #1 add_task /home/meloam/lstore/vendor/apr-util-accre/misc/apr_thread_pool.c:575:14 (libgop.so.0+0x00000005c165)
    #2 _opque_start_execution /home/meloam/lstore/src/gop/opque.c:443:13 (libgop.so.0+0x0000000264b3)
    #3 _gop_start_execution /home/meloam/lstore/src/gop/gop.c:217:9 (libgop.so.0+0x00000000efd1)
    #4 gop_waitany /home/meloam/lstore/src/gop/gop.c:368:13 (libgop.so.0+0x000000010885)
    #5 cache_rw_pages /home/meloam/lstore/src/lio/segment/cache.c:498:15 (liblio.so.0+0x000000110b0d)
    #6 cache_flush_range_gop_func /home/meloam/lstore/src/lio/segment/cache.c:2826:35 (liblio.so.0+0x000000130e3c)
    #7 thread_pool_exec_fn /home/meloam/lstore/src/gop/thread_pool_op.c:232:14 (libgop.so.0+0x00000002a7b1)
    #8 thread_pool_func /home/meloam/lstore/vendor/apr-util-accre/misc/apr_thread_pool.c:271:13 (libgop.so.0+0x00000005bd0b)

SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) /home/meloam/lstore/src/gop/hconnection.c:483:21 in hc_recv_thread
tacketar commented 7 years ago

Fixed with patch 922874de245b0942b959757588323397f9e6cfd6

PerilousApricot commented 7 years ago

Can you help arranging getting me a handful of nodes to smash LStore tests 24/7? Fleshing out these errors takes hitting things runtime

It's dark in this basement.

tacketar commented 7 years ago

Will compute nodes work on the internal network? Do you need 10Gb or is 1Gb enough?

PerilousApricot commented 7 years ago

1g internal nodes would do.

It's dark in this basement.

PerilousApricot commented 7 years ago

s/fleshing out/triggering/

It's dark in this basement.

tacketar commented 7 years ago

Let me see what I can come up with. It'll probably be next week.