accre / lstore

LStore - A fault-tolerant, performant distributed data storage framework.
http://www.lstore.org
Apache License 2.0
4 stars 5 forks source link

Somehow gop stuff is being #132

Open PerilousApricot opened 7 years ago

PerilousApricot commented 7 years ago

Sometime in to a long-running process, it appears gop is being initialized after already being initialized once:

Thread 1 (Thread 0x7f87c8ff1700 (LWP 26243)):
#0  0x00007f87ea7e1bd0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00007f87df87c680 in tbx_pch_reserve (pc=0x7f87ef52c1f0) at /home/meloam/lstore/src/toolbox/pigeon_coop.c:224
#2  0x00007f87dfabae64 in gop_init (gop=0x7f879c0008c0) at /home/meloam/lstore/src/gop/gop.c:690
#3  0x00007f87dfac244b in init_opque (q=0x7f879c0008c0) at /home/meloam/lstore/src/gop/opque.c:241
#4  0x00007f87dfac25db in gop_opque_new () at /home/meloam/lstore/src/gop/opque.c:269
#5  0x00007f87dfd06ead in amp_dirty_thread (th=0x7f87ef640aa8, data=0x7f87ef63f660)
    at /home/meloam/lstore/src/lio/cache/amp.c:249
#6  0x00007f87ea7dfdc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f87ec3a7ced in clone () from /lib64/libc.so.6

That lock somehow causes a deadlock in the process. I haven't quite had time to hunt down how that's slipping through, the init stuff happens on module load, and shouldn't be hit again, AFAICT

tacketar commented 7 years ago

If you can give me a core I can definitely track this down and fix it.

tacketar commented 7 years ago

This looks like the re-used gop control structure is still being used by another gop and it shouldn't be. I'm pretty sure a core would let me track this down and fix it quick.

PerilousApricot commented 7 years ago

I put it in /lio/lfs/cms/store/user/meloam/core-oct12

tacketar commented 7 years ago

And where is the source? Is it plain master?

PerilousApricot commented 7 years ago

lemme just dump over my working tree. I've got a whole mess of uncommitted stuff that i'm trying to work on. It'll be /lio/lfs/cms/store/user/meloam/lstore

tacketar commented 7 years ago

Forgot to ask what's the binary that generated the core?

PerilousApricot commented 7 years ago

centos7 osg 3.3 spacewalk version of globus-gridftp-server