Closed parkrish closed 8 years ago
Probably duplicates #199, please check if the problem is solved with the current master (35d8dc7)
Hi ,
Thanks for looking into the problem.I tested with latest master code. Unfortunately my problem is not solved yet.
There is still deadlock between the notification thread and the send/receive thread because of two locks mut_ntf and mut_session.
(gdb) bt
Regards, Parameswaran
Hi, there is now a separate branch called deadlockfix
with a patch. Could you please try if that patch solves the issue?
Hi ,
Tested the code from latest deadlockfix branch and the issue is resolved. Thanks for the support. When could we possibly have a release with this fix ?
Regards, Parameswaran
ok, I'll wait for response in #199 and if the fix doesn't break it, I'll merge it into the master.
Thanks.Will there be a new Release from the master, post the deadlock merge, any time sooner ?
Regards, Parameswaran
What do you mean by "Release"?
Thanks.By "Release" I meant release branch like 0.9.0, 0.10.0 etc
By that meaining, the master branch is actually 1.0.0
- we do not add new features (changing API), just fixing the reported bugs (our focus is now on libyang, libnetconf2 and Netopeer2).
Thank you for the information
Hi, I have three threads in my Netconf client program, Two threads are involved in sending/receiving Netconf requests. The third thread is a notification thread for receiving notifications.
When , Netconf server crashes, The Notification thread exits as expected (Because of fix for issue, Notification thread never exits on netconf server crash #193 ).
However ,one of the receive threads detects the server failure and attempts to send nc_session_close and it gets blocked at ncntf_dispatch_stop.
(gdb) bt
0 __lll_lock_wait ()
1 0x00007fddde4174d4 in _L_lock_952 ()
from /lib/x86_64-linux-gnu/libpthread.so.0
2 0x00007fddde417336 in __GI___pthread_mutex_lock (mutex=0x12f4798)
3 0x00007fddde8511e8 in ncntf_dispatch_stop () from /usr/lib/libnetconf.so.0
4 0x00007fddde847598 in nc_session_close () from /usr/lib/libnetconf.so.0
5 0x00007fddde84792e in nc_session_send.isra.4.part ()
from /usr/lib/libnetconf.so.0
6 0x00007fddde84651b in nc_session_send_reply ()
from /usr/lib/libnetconf.so.0
7 0x00007fddde846fb1 in nc_session_recv_reply ()
from /usr/lib/libnetconf.so.0
8 0x00007fddde849cc3 in nc_session_send_recv () from /usr/lib/libnetconf.so.0
The other thread also gets blocked waiting for lock..
(gdb) bt
0 __lll_lock_wait ()
1 0x00007fddde4174d4 in _L_lock_952 ()
from /lib/x86_64-linux-gnu/libpthread.so.0
2 0x00007fddde417336 in __GI___pthread_mutex_lock (mutex=0x12f46f8)
3 0x00007fddde847eef in nc_session_send_rpc () from /usr/lib/libnetconf.so.0
4 0x00007fddde849c2b in nc_session_send_recv () from /usr/lib/libnetconf.so.0
Based on code flow,instead of notification thread, if any of the other two threads happen to detect failure and initiate nc_session_close, all three threads would be got into deadlock as that thread would have fetched the lock but would have got blocked at ncntf_dispatch_stop.
I guess, we may have to set session->ntf_active to 0(May be in nc_session_close), to get away from this issue. Can you please look into this problem and provide a solution ?
Regards, Parameswaran