CESNET / netopeer2

NETCONF toolset
BSD 3-Clause "New" or "Revised" License
290 stars 186 forks source link

Timeout observed in module change subscription sometimes #1502

Open ankit7gup opened 7 months ago

ankit7gup commented 7 months ago

Hi @michalvasko

We have upgraded to latest released version 2.1.71. While our tests, sometimes we are facing issue while subscribing to module change. We have tried same bringing up test multiple times, but not facing this issue on older version.

sr_module_change_subscribe() API being called with It is SR_SUBSCR_ENABLED flag passed, sometimes returns SR_ERR_TIME_OUT.

However, in netopeer server logs we don't see any error.

Timestamp of Timeout Error: 2023-11-27 07:13:04.674188

Server logs for reference. (P.S. My system didn't allow me to upload a file, thus copy-pasting it directly)

1970-01-01 01:02:04.967314 info netopeer2-server[1493]: Connection 120 created. 1970-01-01 01:02:04.969030 info netopeer2-server[1493]: Session 9 (user "root", CID 120) created. 1970-01-01 01:02:04.999674 info netopeer2-server[1493]: Triggering "ietf-subscribed-notifications" "done" event on enabled data. 1970-01-01 01:02:05.010008 info netopeer2-server[1493]: Triggering "ietf-netconf-server" "done" event on enabled data. 1970-01-01 01:02:05.016158 info netopeer2-server[1493]: message repeated 2 times: [ Triggering "ietf-netconf-server" "done" event on enabled data.] 1970-01-01 01:02:05.016394 info netopeer2-server[1493]: Listening on :::830 for SSH connections. 1970-01-01 01:02:05.019518 info netopeer2-server[1493]: Triggering "ietf-netconf-server" "done" event on enabled data. 1970-01-01 01:02:05.089488 info netopeer2-server[1493]: message repeated 19 times: [ Triggering "ietf-netconf-server" "done" event on enabled data.] 1970-01-01 01:02:05.093579 info netopeer2-server[1493]: Triggering "ietf-keystore" "done" event on enabled data. 1970-01-01 01:02:05.097685 info netopeer2-server[1493]: Triggering "ietf-truststore" "done" event on enabled data. 1970-01-01 01:02:05.119294 info netopeer2-server[1493]: Triggering "ietf-netconf-acm" "done" event on enabled data. 1970-01-01 01:02:05.198248 info netopeer2-server[1493]: message repeated 3 times: [ Triggering "ietf-netconf-acm" "done" event on enabled data.] 2023-11-27 07:13:00.199121 info netopeer2-server[1493]: Accepted a connection on :::830 from fddc:f09a::110:60846. 2023-11-27 07:13:00.199776 info netopeer2-server[1493]: Session 61 (user "root", CID 120) created. 2023-11-27 07:13:00.276570 info netopeer2-server[1493]: Received an SSH message "request-service" of subtype "ssh-userauth". 2023-11-27 07:13:00.277566 info netopeer2-server[1493]: Received an SSH message "request-auth" of subtype "none". 2023-11-27 07:13:00.278133 info netopeer2-server[1493]: Received an SSH message "request-auth" of subtype "password". 2023-11-27 07:13:00.290850 info netopeer2-server[1493]: User "root" authenticated. 2023-11-27 07:13:00.291856 info netopeer2-server[1493]: Received an SSH message "request-channel-open" of subtype "session". 2023-11-27 07:13:00.292412 info netopeer2-server[1493]: Received an SSH message "request-channel" of subtype "subsystem". 2023-11-27 07:13:00.299893 info netopeer2-server[1493]: Session 62 (user "root", CID 120) created. 2023-11-27 07:13:00.301119 info netopeer2-server[1493]: There are no subscribers for "ietf-netconf-notifications" notifications. 2023-11-27 07:13:00.301181 info netopeer2-server[1493]: Generated new event (netconf-session-start). 2023-11-27 07:13:00.400698 info netopeer2-server[1493]: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 1 priority 0 for 1 subscribers published. 2023-11-27 07:13:00.401242 info netopeer2-server[1493]: EV LISTEN: "/ietf-netconf:get" "rpc" ID 1 priority 0 processing (remaining 1 subscribers). 2023-11-27 07:13:00.486318 info netopeer2-server[1493]: EV LISTEN: "/ietf-netconf:get" "rpc" ID 1 priority 0 success (remaining 0 subscribers). 2023-11-27 07:13:00.486742 info netopeer2-server[1493]: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 1 priority 0 succeeded. 2023-11-27 07:13:00.532304 info netopeer2-server[1493]: Session 1: thread 2 event new RPC. 2023-11-27 07:13:00.610373 info netopeer2-server[1493]: Session 1: Received an SSH message "request-channel-open" of subtype "session". 2023-11-27 07:13:00.610931 info netopeer2-server[1493]: Session 1: Received an SSH message "request-channel" of subtype "subsystem". 2023-11-27 07:13:00.611071 info netopeer2-server[1493]: Session 1: thread 0 event new SSH channel. 2023-11-27 07:13:00.617470 info netopeer2-server[1493]: Session 68 (user "root", CID 120) created. 2023-11-27 07:13:00.617892 info netopeer2-server[1493]: There are no subscribers for "ietf-netconf-notifications" notifications. 2023-11-27 07:13:00.617948 info netopeer2-server[1493]: Generated new event (netconf-session-start). 2023-11-27 07:13:00.721634 info netopeer2-server[1493]: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 2 priority 0 for 1 subscribers published. 2023-11-27 07:13:00.722386 info netopeer2-server[1493]: EV LISTEN: "/ietf-netconf:get" "rpc" ID 2 priority 0 processing (remaining 1 subscribers). 2023-11-27 07:13:00.784175 info netopeer2-server[1493]: EV LISTEN: "/ietf-netconf:get" "rpc" ID 2 priority 0 success (remaining 0 subscribers). 2023-11-27 07:13:00.784336 info netopeer2-server[1493]: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 2 priority 0 succeeded. 2023-11-27 07:13:00.810004 info netopeer2-server[1493]: Session 2: thread 1 event new RPC. 2023-11-27 07:13:00.877964 info netopeer2-server[1493]: EV LISTEN: "ietf-netconf-server" "done" ID 1 priority 0 processing (remaining 23 subscribers). 2023-11-27 07:13:00.879211 err netopeer2-server[1493]: Unable to use Keyboard-Interactive authentication method without setting the name of the PAM configuration file first. 2023-11-27 07:13:00.879563 info netopeer2-server[1493]: EV LISTEN: "ietf-netconf-server" "done" ID 1 priority 0 success (remaining 0 subscribers). 2023-11-27 07:13:00.950924 info netopeer2-server[1493]: Session 1: Received an SSH message "request-channel-open" of subtype "session". 2023-11-27 07:13:00.951773 info netopeer2-server[1493]: Session 1: Received an SSH message "request-channel" of subtype "subsystem". 2023-11-27 07:13:00.951940 info netopeer2-server[1493]: Session 1: thread 2 event new SSH channel. 2023-11-27 07:13:00.958382 info netopeer2-server[1493]: Session 77 (user "root", CID 120) created. 2023-11-27 07:13:00.958775 info netopeer2-server[1493]: There are no subscribers for "ietf-netconf-notifications" notifications. 2023-11-27 07:13:00.958824 info netopeer2-server[1493]: Generated new event (netconf-session-start). 2023-11-27 07:13:01.020629 info netopeer2-server[1493]: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 3 priority 0 for 1 subscribers published. 2023-11-27 07:13:01.021597 info netopeer2-server[1493]: EV LISTEN: "/ietf-netconf:get" "rpc" ID 3 priority 0 processing (remaining 1 subscribers). 2023-11-27 07:13:01.152502 info netopeer2-server[1493]: EV LISTEN: "/ietf-netconf:get" "rpc" ID 3 priority 0 success (remaining 0 subscribers). 2023-11-27 07:13:01.152901 info netopeer2-server[1493]: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 3 priority 0 succeeded. 2023-11-27 07:13:01.190539 info netopeer2-server[1493]: Session 3: thread 1 event new RPC. 2023-11-27 07:13:01.389680 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 1 priority 5 for 1 subscribers published. 2023-11-27 07:13:01.400023 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 1 priority 5 succeeded. 2023-11-27 07:13:01.400371 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 1 priority 0 for 1 subscribers published. 2023-11-27 07:13:01.400806 info netopeer2-server[1493]: EV LISTEN: "/notifications:create-subscription" "rpc" ID 1 priority 0 processing (remaining 1 subscribers). 2023-11-27 07:13:01.410240 info netopeer2-server[1493]: EV LISTEN: "/notifications:create-subscription" "rpc" ID 1 priority 0 success (remaining 0 subscribers). 2023-11-27 07:13:01.410590 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 1 priority 0 succeeded. 2023-11-27 07:13:01.411522 info netopeer2-server[1493]: Session 1: thread 2 event new RPC. 2023-11-27 07:13:01.413542 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 2 priority 5 for 1 subscribers published. 2023-11-27 07:13:01.430223 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 2 priority 5 succeeded. 2023-11-27 07:13:01.430559 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 2 priority 0 for 1 subscribers published. 2023-11-27 07:13:01.431032 info netopeer2-server[1493]: EV LISTEN: "/notifications:create-subscription" "rpc" ID 2 priority 0 processing (remaining 1 subscribers). 2023-11-27 07:13:01.878783 info netopeer2-server[1493]: Call Home client "vlan_client_2001:0df3:ff80:b015:0000:0000:0000:0048" endpoint "vlan_endpoint" connecting... 2023-11-27 07:13:01.878860 info netopeer2-server[1493]: Trying to connect via IPv6 to 2001:df3:ff80:b015::48:4334. 2023-11-27 07:13:01.880526 info netopeer2-server[1493]: Successfully connected to 2001:df3:ff80:b015::48:4334 over IPv6. 2023-11-27 07:13:01.886710 info netopeer2-server[1493]: Session 83 (user "root", CID 120) created. 2023-11-27 07:13:01.984658 info netopeer2-server[1493]: EV LISTEN: "/notifications:create-subscription" "rpc" ID 2 priority 0 success (remaining 0 subscribers). 2023-11-27 07:13:01.985719 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 2 priority 0 succeeded. 2023-11-27 07:13:01.986592 info netopeer2-server[1493]: Session 2: thread 2 event new RPC. 2023-11-27 07:13:01.991180 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 3 priority 5 for 1 subscribers published. 2023-11-27 07:13:02.002090 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 3 priority 5 succeeded. 2023-11-27 07:13:02.002449 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 3 priority 0 for 1 subscribers published. 2023-11-27 07:13:02.002866 info netopeer2-server[1493]: EV LISTEN: "/notifications:create-subscription" "rpc" ID 3 priority 0 processing (remaining 1 subscribers). 2023-11-27 07:13:02.045201 info netopeer2-server[1493]: Received an SSH message "request-auth" of subtype "none". 2023-11-27 07:13:02.048856 info netopeer2-server[1493]: Received an SSH message "request-auth" of subtype "password". 2023-11-27 07:13:02.071264 info netopeer2-server[1493]: User "root" authenticated. 2023-11-27 07:13:02.074132 info netopeer2-server[1493]: Received an SSH message "request-channel-open" of subtype "session". 2023-11-27 07:13:02.075832 info netopeer2-server[1493]: Received an SSH message "request-channel" of subtype "subsystem". 2023-11-27 07:13:02.080569 info netopeer2-server[1493]: Call Home client "vlan_client_2001:0df3:ff80:b015:0000:0000:0000:0048" session 4 established. 2023-11-27 07:13:02.111258 info netopeer2-server[1493]: Session 86 (user "root", CID 120) created. 2023-11-27 07:13:02.172105 info netopeer2-server[1493]: EV ORIGIN: "ietf-netconf-notifications" "notif" ID 1 priority 0 for 2 subscribers published. 2023-11-27 07:13:02.173172 info netopeer2-server[1493]: Generated new event (netconf-session-start). 2023-11-27 07:13:02.197147 info netopeer2-server[1493]: EV LISTEN: "ietf-netconf-notifications" "notif" ID 1 processing. 2023-11-27 07:13:02.197363 info netopeer2-server[1493]: EV LISTEN: "ietf-netconf-notifications" "notif" ID 1 priority 0 success (remaining 0 subscribers). 2023-11-27 07:13:02.259579 info netopeer2-server[1493]: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 4 priority 0 for 1 subscribers published. 2023-11-27 07:13:02.260700 info netopeer2-server[1493]: EV LISTEN: "/ietf-netconf:get" "rpc" ID 4 priority 0 processing (remaining 1 subscribers). 2023-11-27 07:13:02.454737 info netopeer2-server[1493]: EV LISTEN: "/notifications:create-subscription" "rpc" ID 3 priority 0 success (remaining 0 subscribers). 2023-11-27 07:13:02.455097 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 3 priority 0 succeeded. 2023-11-27 07:13:02.456038 info netopeer2-server[1493]: Session 3: thread 2 event new RPC. 2023-11-27 07:13:02.458269 info netopeer2-server[1493]: EV ORIGIN: "/o-ran-supervision:supervision-watchdog-reset" "rpc" ID 1 priority 0 for 1 subscribers published. 2023-11-27 07:13:02.469053 info netopeer2-server[1493]: EV ORIGIN: "/o-ran-supervision:supervision-watchdog-reset" "rpc" ID 1 priority 0 succeeded. 2023-11-27 07:13:02.469986 info netopeer2-server[1493]: Session 1: thread 2 event new RPC. 2023-11-27 07:13:02.481484 info netopeer2-server[1493]: EV LISTEN: "/ietf-netconf:get" "rpc" ID 4 priority 0 success (remaining 0 subscribers). 2023-11-27 07:13:02.481690 info netopeer2-server[1493]: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 4 priority 0 succeeded. 2023-11-27 07:13:02.482871 info netopeer2-server[1493]: Session 4: thread 1 event new RPC. 2023-11-27 07:13:02.487498 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 4 priority 5 for 1 subscribers published. 2023-11-27 07:13:02.488047 info netopeer2-server[1493]: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 5 priority 0 for 1 subscribers published. 2023-11-27 07:13:02.488813 info netopeer2-server[1493]: EV LISTEN: "/ietf-netconf:get" "rpc" ID 5 priority 0 processing (remaining 1 subscribers). 2023-11-27 07:13:02.497078 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/uuid" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.503299 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 4 priority 5 succeeded. 2023-11-27 07:13:02.504686 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 4 priority 0 for 1 subscribers published. 2023-11-27 07:13:02.505584 info netopeer2-server[1493]: EV LISTEN: "/notifications:create-subscription" "rpc" ID 4 priority 0 processing (remaining 1 subscribers). 2023-11-27 07:13:02.537146 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/uuid" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.539338 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/is-fru" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.586559 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/is-fru" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.588773 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/mfg-name" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.629205 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/mfg-name" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.631197 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/mfg-date" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.663838 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/mfg-date" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.665831 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/serial-num" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.697687 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/serial-num" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.700072 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/model-name" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.739563 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/model-name" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.748294 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/description" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.795399 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/description" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.798583 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/hardware-rev" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.827839 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/hardware-rev" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.830866 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/firmware-rev" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.860915 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/firmware-rev" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.863599 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/software-rev" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.905088 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/software-rev" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.908597 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/contains-child" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.941561 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/contains-child" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.951344 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/state/oper-state" "oper get" index 0 ID 1 published. 2023-11-27 07:13:02.986115 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/state/oper-state" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:02.988628 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/state/usage-state" "oper get" index 0 ID 1 published. 2023-11-27 07:13:03.035224 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/state/usage-state" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:03.047688 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/state/state-last-changed" "oper get" index 0 ID 1 published. 2023-11-27 07:13:04.714683 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/state/state-last-changed" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:04.716429 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/o-ran-hardware:product-code" "oper get" index 0 ID 1 published. 2023-11-27 07:13:04.751488 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/o-ran-hardware:product-code" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:04.753561 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/state/o-ran-hardware:power-state" "oper get" index 0 ID 1 published. 2023-11-27 07:13:04.788379 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/state/o-ran-hardware:power-state" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:04.790645 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/o-ran-hardware:dying-gasp-support" "oper get" index 0 ID 1 published. 2023-11-27 07:13:04.824224 info netopeer2-server[1493]: EV ORIGIN: "/ietf-hardware:hardware/component[name='ORU']/o-ran-hardware:dying-gasp-support" "oper get" index 0 ID 1 succeeded. 2023-11-27 07:13:04.827457 info netopeer2-server[1493]: EV LISTEN: "/ietf-netconf:get" "rpc" ID 5 priority 0 success (remaining 0 subscribers). 2023-11-27 07:13:04.829058 info netopeer2-server[1493]: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 5 priority 0 succeeded. 2023-11-27 07:13:04.845385 info netopeer2-server[1493]: Session 1: thread 2 event new RPC. 2023-11-27 07:13:05.249226 info netopeer2-server[1493]: EV LISTEN: "/notifications:create-subscription" "rpc" ID 4 priority 0 success (remaining 0 subscribers). 2023-11-27 07:13:05.249410 info netopeer2-server[1493]: EV ORIGIN: "/notifications:create-subscription" "rpc" ID 4 priority 0 succeeded. 2023-11-27 07:13:05.249665 warning netopeer2-server[1493]: Connection with CID 121 is dead. 2023-11-27 07:13:05.249711 warning netopeer2-server[1493]: Recovering RPC/action "/notifications:create-subscription" subscription of CID 121. 2023-11-27 07:13:05.265420 info netopeer2-server[1493]: Session 4: thread 1 event new RPC. 2023-11-27 07:13:05.272145 info netopeer2-server[1493]: EV ORIGIN: "/ietf-netconf:get" "rpc" ID 6 priority 0 for 1 subscribers published. 2023-11-27 07:13:05.272863 info netopeer2-server[1493]: EV LISTEN: "/ietf-netconf:get" "rpc" ID 6 priority 0 processing (remaining 1 subscribers). 2023-11-27 07:13:05.294737 warning netopeer2-server[1493]: Recovering module "ietf-hardware" operational get subscription of CID 121.

Regards, Ankit

michalvasko commented 7 months ago

Not sure why no error is printed, there should definitely be one. The locks could have been changed in some way that a use-case that worked before (but maybe not correctly) no longer does, is sr_module_change_subscribe() called in another callback?

ankit7gup commented 7 months ago

No, it is not being called inside another callback, but there are multiple subscription calls which were being done in a loop.

michalvasko commented 7 months ago

Okay, so either you can provide more information/code snippets or you could write an example that reproduces it, I cannot help you otherwise.

ankit7gup commented 7 months ago

ok, we will try to capture more information and share with you.

ankit7gup commented 7 months ago

Hi @michalvasko,

We have captured the API call flow now. One mistake I made while reporting the issue, Timeout is not happening in sr_module_change_subscribe(), but it happens in one of the sr_oper_get_subscribe() API execution with optional flag as 0. Sharing the netopeer2-server logs as well as the sysrepo API call sequence. You can see SR_ERR_TIME_OUT being printed, which is a logged in our application.

share-netopeer.txt share-callflow.txt

Thanks!

Regards, Ankit

michalvasko commented 7 months ago

Okay, can you tell me more about the exact scenario? Is the application calling the sr_oper_get_subscribe() multi-threaded or are there simply more subscriptions being done simultaneously in several threads/applications? There could be a some function-call-path causing a dead-lock that we have not found yet.

ankit7gup commented 7 months ago

Okay, can you tell me more about the exact scenario? Is the application calling the sr_oper_get_subscribe() multi-threaded or are there simply more subscriptions being done simultaneously in several threads/applications? There could be a some function-call-path causing a dead-lock that we have not found yet.

It is one application subscribing in a single thread calling the API in a loop of all xpath and modules to be subscribed. And this behavior is not happening always but sometimes 2/10 times observed recently when we added more logs. Earlier it was happening 6/10 times. Hope this info helps.

michalvasko commented 7 months ago

Okay, then it could be another thing, are the callbacks called immediately after being subscribed? Meaning are the operational data the subscriptions provide immediately retrieved?

ankit7gup commented 7 months ago

Okay, then it could be another thing, are the callbacks called immediately after being subscribed? Meaning are the operational data the subscriptions provide immediately retrieved?

Hi, no. We are only subscribing till the time error happens. Not immediately. Instead, there is a buffer time which will be used for this purpose.

michalvasko commented 7 months ago

Look, it makes no sense you getting a timeout if there is no contention. There is another thread or process interfering with the subscriptions, I have suggested what it could be but it is up to you to determine it exactly.

ankit7gup commented 7 months ago

I have printed all the APIs as well, and if you see you won't find any other API interfering with sysrepo. Even thread IDs I have tried printing, all are using the same Thread ID. Also, exact same thing is working every time in older versions, in-fact this code has never been changed. It is not possible to have an API called without it being printed in our application. Also, there is no other application running. Is there anyway, may be enabling additional logging to the netopeer2/sysrepo, if there would have been any other API being called from any application.

I will also try one more thing, to add additional logs inside sysrepo API to check what could be the problem.

michalvasko commented 7 months ago

I will also try one more thing, to add additional logs inside sysrepo API to check what could be the problem.

Try that or something else because I have really nothing to go on. But most weird is that no error is printed, it should always be, so you can try to find the exact line the time out is returned.

ankit7gup commented 7 months ago

I will also try one more thing, to add additional logs inside sysrepo API to check what could be the problem.

Try that or something else because I have really nothing to go on. But most weird is that no error is printed, it should always be, so you can try to find the exact line the time out is returned.

Hi @michalvasko

We have added logs inside sysrepo and also enabled highest level of logging in our application. Please find attached the log files from the time of issue.

logs.txt syslog-netopeer.txt

2023-12-04 06:52:31.567027 err sysrepo_log_dbg[1788]: [ERR] Locking a mutex failed (sr_shmext_oper_get_sub_add: Connection timed out).

ankit7gup commented 7 months ago

Hi @michalvasko Sorry to bother you again, we are kind of stuck with this version upgrade due to the mentioned reason. Did you got any chance to have a look at the logs. Any inputs from you are appreciated.

Thanks again!

Regards, Ankit

michalvasko commented 7 months ago

I am wondering about these errors:

2023-12-04 06:52:35.545443  err netopeer2-server[1492]: 

There is no actual message and that should definitely be fixed but I would need to know where it is generated first. Other than that, there is lots of concurrency when the subscription timeout occurs (you claimed otherwise before) and that is probably the cause. So I would really have to have access to much more information to be able to do anything with this, ideally able to reproduce it (it is not really possible for you to collect all the required information).

I can only suggest a workaround. If you retry the subscription after the timeout, I am pretty sure it will succeed.

ankit7gup commented 7 months ago

Hi @michalvasko Earlier when I mentioned only subscription is happening, it was from the application's point of view. But yes of course they are Netconf get calls happening while operational data subscription is happening, as the client IP is already updated, and client gets connected. And we added a retry mechanism up to 3 times in case we observe TIMEOUT error, however it still fails.

As per our analysis, every time we get the timeout while subscribing for operational data, there is a common function being called from sr_oper_get_subscribe() and get() operation.

sr_shmsub_multi_listen_write_event 2023-12-06 12:34:07.930138 info sysrepo_log_mav[1786]: [INF] EV LISTEN: "/o-ran-software-management:software-inventory/software-slot[name='SLOT_2']" "oper get" ID 5 processing success.
  SR_LOG_INF("EV LISTEN: \"%s\" \"%s\" ID %" PRIu32 " priority %" PRIu32 " %s (remaining %" PRIu32 " subscribers).",
            event_desc, sr_ev2str(event), (uint32_t)ATOMIC_LOAD_RELAXED(multi_sub_shm->request_id),
            (uint32_t)ATOMIC_LOAD_RELAXED(multi_sub_shm->priority), result_str, multi_sub_shm->subscriber_count);
    return NULL;
}
michalvasko commented 7 months ago

And we added a retry mechanism up to 3 times in case we observe TIMEOUT error, however it still fails.

I see, then the problem seems quite serious. All the more reason for me to reproduce it. If it can be reliably reproduced on your machine and you would be fine with giving me temporary access, I could send you my public SSH key, connect to the machine, and debug remotely. Or you could try to prepare a separate app(s) with a script, possibly, which would reproduce this. Again, I do not think I can help you otherwise.

ankit7gup commented 7 months ago

And we added a retry mechanism up to 3 times in case we observe TIMEOUT error, however it still fails.

I see, then the problem seems quite serious. All the more reason for me to reproduce it. If it can be reliably reproduced on your machine and you would be fine with giving me temporary access, I could send you my public SSH key, connect to the machine, and debug remotely. Or you could try to prepare a separate app(s) with a script, possibly, which would reproduce this. Again, I do not think I can help you otherwise.

Unfortunately, access cannot be provided to the machine. However, we are trying to do some experiments, to get the exact scenario reproduced (every time). In case we can get it happen, if you want some info, or any patch to be applied to collect more data, we can do that immediately and try to collectively get it resolved, if that works for you.

michalvasko commented 7 months ago

... try to collectively get it resolved, if that works for you.

This is what I am trying to avoid because it is the most inefficient way of solving issues without a guaranteed result. I have already mentioned some suspicious things that I would focus on, such as no error being printed for the error in the logs you provided earlier.

ankit7gup commented 7 months ago

... try to collectively get it resolved, if that works for you.

This is what I am trying to avoid because it is the most inefficient way of solving issues without a guaranteed result. I have already mentioned some suspicious things that I would focus on, such as no error being printed for the error in the logs you provided earlier.

ok, what do you suggest then we can do now?

michalvasko commented 7 months ago

What you said:

However, we are trying to do some experiments, to get the exact scenario reproduced (every time).

ankit7gup commented 7 months ago

Hi @michalvasko After quite some tests, Though it is a very dirty way but I have prepared some example which once ran will reproduce similar problem of timeout for operational data subscription. In sometime it gives similar errors.

However orignal error reason was different as below.

Original issue: 2023-12-04 06:52:31.567027 err sysrepo_log_dbg[1788]: [ERR] Locking a mutex failed (sr_shmext_oper_get_sub_add: Connection timed out).

Currently reproduced issue: image

client.c.txt oper.c.txt

ankit7gup commented 6 months ago

Hi @michalvasko,

One potential place, I have observed which can be problematic, would like to get your opinion. Can misuse/missed sr_release_ctxt for sr_acquire_context() causes the below error?

2023-12-04 06:52:31.567027 err sysrepo_log_dbg[1788]: [ERR] Locking a mutex failed (sr_shmext_oper_get_sub_add: Connection timed out).

One more observation, sr_acquire_context() if done on same connection as following way, is this behavior expected, however we have seen this in both success as well as failure logs.

Notification callback Thread1->Acquire context. Operational data callback Thread2->Acquire context. Operational data Thread2->Release context. Notification callback Thread1->Release context.

Thanks!

Regards, Ankit

michalvasko commented 6 months ago

Thanks for the effort and the example apps, I really could reproduce some timeouts. However, what you are doing in the operational callback is wrong and causing the problems. With fixed callback I could see no more errors:

static int
dp_get_items_cb(sr_session_ctx_t *session, uint32_t sub_id, const char *module_name, const char *xpath,
        const char *request_xpath, uint32_t request_id, struct lyd_node **parent, void *private_data)
{

    (void)session;
    (void)sub_id;
    (void)request_xpath;
    (void)request_id;
    (void)private_data;

    printf("\n\n ========== DATA FOR \"%s\" \"%s\" REQUESTED =======================\n\n", module_name, xpath);
    int rc = 0;

    rc = lyd_new_path(NULL, sr_session_acquire_context(session), "/examples:stats/counter", "1000", 0, parent);
    sr_session_release_context(session);
    printf("rc %d\n", rc);
    rc = lyd_new_path(*parent, NULL, "/examples:stats/counter2", "2000", 0, NULL);
    printf("rc %d\n", rc);
    return SR_ERR_OK;
}

You are supposed to provide the data by appending them to parent, not set them using sr_apply_changes().

ankit7gup commented 6 months ago

Thanks, this was probably just an example I was trying to simulate the timeouts. We have anyways captured new set of logs from original application rerun. Will let one of my colleagues to share that with you, in case that may provide some more information about the original timeout issue.

Thanks again for continuous support!

Regards, Ankit

amitdalodiya commented 6 months ago

Hi @michalvasko

3 errors we got during app run . 2023-12-08 10:39:56.418457 err sysrepo[1810]: [ERR] Locking a mutex failed (sr_shmext_oper_get_sub_add: Connection timed out). this one is most critical because system not able recover from this error.

2023-12-08 10:39:53.514253 err sysrepo[1810]: [ERR] Editing list keys is not supported, edit list instances instead. 2023-12-08 10:39:53.514515 err sysrepo[1810]: Failed to set: /o-ran-transceiver:port-transceivers/port-transceiver-data[interface-name='fheth0'][port-number='0']/interface-name : fheth0 2023-12-08 10:39:56.418457 err sysrepo[1810]: [ERR] Locking a mutex failed (sr_shmext_oper_get_sub_add : Connection timed out).

Thanks, Amit app_sysrepo_calls.txt

netopeer-server.txt

amitdalodiya commented 6 months ago

hi @michalvasko Do you get chance to look into logs? any suggestion to tryout.

Thanks, Amit

michalvasko commented 6 months ago

I have but there is nothing interesting in them so I have no new advice to give.