CESNET / libnetconf

C NETCONF library
Other
113 stars 83 forks source link

Deadlock between two receive threads when Netconf server crashes #200

Closed parkrish closed 8 years ago

parkrish commented 8 years ago

Hi, I have three threads in my Netconf client program, Two threads are involved in sending/receiving Netconf requests. The third thread is a notification thread for receiving notifications.

When , Netconf server crashes, The Notification thread exits as expected (Because of fix for issue, Notification thread never exits on netconf server crash #193 ).

However ,one of the receive threads detects the server failure and attempts to send nc_session_close and it gets blocked at ncntf_dispatch_stop.

(gdb) bt

0 __lll_lock_wait ()

at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

1 0x00007fddde4174d4 in _L_lock_952 ()

from /lib/x86_64-linux-gnu/libpthread.so.0

2 0x00007fddde417336 in __GI___pthread_mutex_lock (mutex=0x12f4798)

at ../nptl/pthread_mutex_lock.c:114

3 0x00007fddde8511e8 in ncntf_dispatch_stop () from /usr/lib/libnetconf.so.0

4 0x00007fddde847598 in nc_session_close () from /usr/lib/libnetconf.so.0

5 0x00007fddde84792e in nc_session_send.isra.4.part ()

from /usr/lib/libnetconf.so.0

6 0x00007fddde84651b in nc_session_send_reply ()

from /usr/lib/libnetconf.so.0

7 0x00007fddde846fb1 in nc_session_recv_reply ()

from /usr/lib/libnetconf.so.0

8 0x00007fddde849cc3 in nc_session_send_recv () from /usr/lib/libnetconf.so.0

The other thread also gets blocked waiting for lock..

(gdb) bt

0 __lll_lock_wait ()

at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

1 0x00007fddde4174d4 in _L_lock_952 ()

from /lib/x86_64-linux-gnu/libpthread.so.0

2 0x00007fddde417336 in __GI___pthread_mutex_lock (mutex=0x12f46f8)

at ../nptl/pthread_mutex_lock.c:114

3 0x00007fddde847eef in nc_session_send_rpc () from /usr/lib/libnetconf.so.0

4 0x00007fddde849c2b in nc_session_send_recv () from /usr/lib/libnetconf.so.0

Based on code flow,instead of notification thread, if any of the other two threads happen to detect failure and initiate nc_session_close, all three threads would be got into deadlock as that thread would have fetched the lock but would have got blocked at ncntf_dispatch_stop.

I guess, we may have to set session->ntf_active to 0(May be in nc_session_close), to get away from this issue. Can you please look into this problem and provide a solution ?

Regards, Parameswaran

rkrejci commented 8 years ago

Probably duplicates #199, please check if the problem is solved with the current master (35d8dc7)

parkrish commented 8 years ago

Hi ,

Thanks for looking into the problem.I tested with latest master code. Unfortunately my problem is not solved yet.

There is still deadlock between the notification thread and the send/receive thread because of two locks mut_ntf and mut_session.

Notify thread holds mut_ntf and is waiting for mut_session lock.

0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

1 0x00007f8322cfd4d4 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0

2 0x00007f8322cfd336 in __GI___pthread_mutex_lock (mutex=0x269d078) at ../nptl/pthread_mutex_lock.c:114

3 0x00007f832312e53a in nc_session_close (session=0x269cff0, reason=NC_SESSION_TERM_DROPPED) at src/session.c:1225

4 0x00007f832312fcf6 in nc_session_receive (session=0x269cff0, timeout=0, msg=0x7f831c6e0e60) at src/session.c:2131

5 0x00007f832313074d in nc_session_recv_msg (session=0x269cff0, timeout=0, msg=0x7f831c6e0e60) at src/session.c:2363

6 0x00007f8323130eba in nc_session_recv_notif (session=0x269cff0, timeout=0, ntf=0x7f831c6e0ea0) at src/session.c:2542

7 0x00007f832313daf3 in ncntf_dispatch_receive (session=0x269cff0, process_ntf=0x7f8323ce189c ) at src/notifications.c:2681

Send /receive thread holding mut_session and waiting for mut_ntf

(gdb) bt

0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

1 0x00007f8322cfd4d4 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0

2 0x00007f8322cfd336 in __GI___pthread_mutex_lock (mutex=0x269d118) at ../nptl/pthread_mutex_lock.c:114

3 0x00007f832312e2ba in ncntf_dispatch_stop (session=0x269cff0) at src/session.c:1194

4 0x00007f832312e582 in nc_session_close (session=0x269cff0, reason=NC_SESSION_TERM_DROPPED) at src/session.c:1234

5 0x00007f832312ec57 in nc_session_send (session=0x269cff0, msg=0x1ea8a20) at src/session.c:1524

6 0x00007f8323131f55 in nc_session_send_reply (session=0x269cff0, rpc=0x0, reply=0x25adf80) at src/session.c:2949

7 0x00007f83231306a8 in nc_session_receive (session=0x269cff0, timeout=100, msg=0x7fffd7866f58) at src/session.c:2345

8 0x00007f832313074d in nc_session_recv_msg (session=0x269cff0, timeout=100, msg=0x7fffd7866f58) at src/session.c:2363

9 0x00007f83231308bf in nc_session_recv_reply (session=0x269cff0, timeout=-1, reply=0x7fffd7867028) at src/session.c:2409

10 0x00007f832313228a in nc_session_send_recv (session=0x269cff0, rpc=0x2c10f40, reply=0x7fffd7867028) at src/session.c:3036

Regards, Parameswaran

rkrejci commented 8 years ago

Hi, there is now a separate branch called deadlockfix with a patch. Could you please try if that patch solves the issue?

parkrish commented 8 years ago

Hi ,

Tested the code from latest deadlockfix branch and the issue is resolved. Thanks for the support. When could we possibly have a release with this fix ?

Regards, Parameswaran

rkrejci commented 8 years ago

ok, I'll wait for response in #199 and if the fix doesn't break it, I'll merge it into the master.

parkrish commented 8 years ago

Thanks.Will there be a new Release from the master, post the deadlock merge, any time sooner ?

Regards, Parameswaran

rkrejci commented 8 years ago

What do you mean by "Release"?

parkrish commented 8 years ago

Thanks.By "Release" I meant release branch like 0.9.0, 0.10.0 etc

rkrejci commented 8 years ago

By that meaining, the master branch is actually 1.0.0 - we do not add new features (changing API), just fixing the reported bugs (our focus is now on libyang, libnetconf2 and Netopeer2).

parkrish commented 8 years ago

Thank you for the information