CESNET / netopeer2

NETCONF toolset
BSD 3-Clause "New" or "Revised" License
296 stars 187 forks source link

Sysrepo exception post CallHome session established and subscribing to notifications #1412

Closed Krisscut closed 11 months ago

Krisscut commented 1 year ago

Hi !

Description

We are encountering a sporadic issue in some internal tests where netopeer2-server crashed and we can see an error trace about subscriber_count (see traces below) In the attached traces, we are in a scenario where we have configured 2 callhome clients, and we have a third session which is established with a "connect" on netopeer2's listening port. Then at some point, we perform a restart of the stack (see "oranrestart") which stop the various services, clean the shm files from sysrepo, before restarting the services.

It seems the error occurs when the second callhome client session is established and we subscribe to notification on this one.

What could cause such an error ? Maybe an incorrect cleanup from a prior restart ? Given the code, it would mean we have more subscribers that processed the event than we have registered subscribers, right ? How is it possible ? Do we have specific traces that we can check for that specific issue ?

Versions

SYSREPO = sysrepo-2.2.36 NETCONF2 = libnetconf2-2.1.28 LIBYANG = libyang-2.1.30 LIBYANGCPP = libyang-cpp-1.1.0 SYSREPOCPP = sysrepo-cpp-v1.0.0-23 PYSYSREPO = sysrepo-python-1.4.0 PYYANG = libyang-python-2.6.0

Traces

Crashtrace: image

(Snippet from the full trace )

Line 38486: [INF]: 22:32:04: SR: Session 71 (user "root", CID 24) created. Line 38491: [INF]: 22:32:04: SR: Session 72 (user "root", CID 24) created. Line 38502: [INF]: 22:32:04: SR: Session 73 (user "root", CID 24) created. Line 38543: [INF]: 22:32:04: SR: Session 76 (user "root", CID 24) created. Line 38929: [INF]: 22:32:04: SR: Session 78 (user "root", CID 24) created. Line 38979: [INF]: 22:32:04: SR: Session 81 (user "root", CID 24) created. Line 39281: [INF]: 22:32:04: SR: Session 83 (user "root", CID 24) created. Line 39349: [INF]: 22:32:04: SR: Session 86 (user "root", CID 24) created. Line 39416: [INF]: 22:32:04: SR: Session 89 (user "root", CID 24) created. Line 39451: [INF]: LN: Call Home client "NC-Client-1" endpoint "default-ssh" connecting... Line 39453: [INF]: LN: Successfully connected to 172.1.1.5:4334 over IPv4. Line 39454: [INF]: SR: Session 92 (user "root", CID 23) created. Line 39457: [INF]: LN: Call Home client "NC-Client-2" endpoint "default-ssh" connecting... Line 39478: [INF]: LN: Call Home client "NC-Client-1" session 1 established. Line 39479: [INF]: SR: Session 93 (user "root", CID 23) created. Line 39486: [INF]: 22:32:05: OR: /orusim/orusw/ietf-netconf-notifications.c:107 username netconf connected Line 39487: [INF]: 22:32:05: OR: /orusim/orusw/ietf-netconf-notifications.c:116 session-id 1 connected Line 39488: [INF]: 22:32:05: OR: /orusim/orusw/ietf-netconf-notifications.c:120 NETCONF Client 172.1.1.5 connected Line 39498: [INF]: 22:32:05: SR: Session 95 (user "root", CID 24) created. Line 40220: [INF]: 22:32:06: SR: Session 145 (user "root", CID 24) created. Line 40228: [INF]: 22:32:07: SR: Session 146 (user "root", CID 24) created. Line 40234: [INF]: LN: Accepted a connection on 0.0.0.0:830 from 172.1.1.5:33520. Line 40235: [INF]: SR: Session 147 (user "root", CID 23) created. Line 40236: [2023/05/16 22:32:07.991969, 1] ssh_bind_config_parse_line: Unknown option: HostbasedAcceptedKeyTypes, line: 9 Line 40257: [INF]: SR: Session 148 (user "root", CID 23) created. Line 40265: [INF]: 22:32:08: OR: /orusim/orusw/ietf-netconf-notifications.c:107 username netconf connected Line 40266: [INF]: 22:32:08: OR: /orusim/orusw/ietf-netconf-notifications.c:116 session-id 2 connected Line 40267: [INF]: 22:32:08: OR: /orusim/orusw/ietf-netconf-notifications.c:120 NETCONF Client 172.1.1.5 connected Line 41013: [INF]: 22:32:08: SR: Session 200 (user "root", CID 24) created. Line 41018: [INF]: LN: Successfully connected to 172.1.1.8:443 over IPv4. Line 41019: [INF]: SR: Session 201 (user "root", CID 23) created. Line 41040: [INF]: LN: Call Home client "NC-Client-2" session 3 established. Line 41041: [INF]: SR: Session 202 (user "root", CID 23) created. Line 41055: [INF]: 22:32:09: OR: /orusim/orusw/ietf-netconf-notifications.c:107 username xranuser connected Line 41056: [INF]: 22:32:09: OR: /orusim/orusw/ietf-netconf-notifications.c:116 session-id 3 connected Line 41057: [INF]: 22:32:09: OR: /orusim/orusw/ietf-netconf-notifications.c:120 NETCONF Client 172.1.1.8 connected Line 41888: [INF]: 22:32:11: SR: Session 256 (user "root", CID 24) created. Line 41921: [INF]: 22:32:11: SR: Session 259 (user "root", CID 24) created. Line 41936: [INF]: 22:32:11: SR: Session 261 (user "root", CID 24) created. Line 41961: 2023-05-16T22:32:12,258809489+00:00 oranrestart Restart cause is AdminModuleReset Line 41964: 2023-05-16T22:32:12,511464566+00:00 oranrestart Stopping CarrierStateManager then netopeer2-server Line 41980: [WRN]: SR: Connection with CID 24 is dead. Line 41981: [WRN]: SR: Recovering module "ietf-netconf-notifications"notification subscription of CID 24. Line 41984: [WRN]: SR: Recovering module "ietf-netconf-notifications"notification subscription of CID 24. Line 41999: [INF]: LN: Call Home client "NC-Client-1" session terminated. Line 42000: [ERR]: LN: Call Home client "NC-Client-1" was not found. Line 42001: [INF]: LN: Call Home client "NC-Client-1" thread exit. Line 42016: [INF]: LN: Call Home client "NC-Client-2" session terminated. Line 42017: [ERR]: LN: Call Home client "NC-Client-2" was not found. Line 42020: [INF]: LN: Call Home client "NC-Client-2" thread exit. Line 42039: 2023-05-16T22:32:13,232054613+00:00 oranrestart Cleaning up data... Line 42043: Removing SHM... Line 42054: 2023-05-16T22:32:13,282978537+00:00 oranrestart Sleeping for 5 seconds... Line 42057: 2023-05-16T22:32:18,286621707+00:00 oranrestart Start start-up sequence Line 42177: [INF] Connection 2 created. Line 42189: [INF] Connection 3 created. Line 42201: [INF] Connection 4 created. Line 42213: [INF] Connection 5 created. Line 42226: [INF] Connection 6 created. Line 42240: [INF] Connection 7 created. Line 42256: [INF] Connection 8 created. Line 42270: [INF] Connection 9 created. Line 42284: [INF] Connection 10 created. Line 42302: [INF] Connection 11 created. Line 42314: [INF] Connection 12 created. Line 42326: [INF] Connection 13 created. Line 42341: [INF] Connection 14 created. Line 42353: [INF] Connection 15 created. Line 42371: [INF] Connection 16 created. Line 42383: [INF] Connection 17 created. Line 42395: [INF] Connection 18 created. Line 42411: [INF] Connection 19 created. Line 42424: [INF] Connection 20 created. Line 42663: [INF]: SR: Connection 23 created. Line 42666: [INF]: SR: Session 2 (user "root", CID 23) created. Line 42762: [INF]: 22:32:30: SR: Connection 24 created. Line 42765: [INF]: 22:32:30: SR: Session 33 (user "root", CID 24) created. Line 42924: [INF]: 22:32:30: SR: Session 34 (user "root", CID 24) created. Line 43183: [INF] Session 35 (user "root", CID 24) created. Line 43184: [INF]: 22:32:30: SR: Session 35 (user "root", CID 24) created. Line 43307: [INF] Session 36 (user "root", CID 24) created. Line 43308: [INF]: 22:32:30: SR: Session 36 (user "root", CID 24) created. Line 44028: [INF]: 22:32:30: SR: Session 54 (user "root", CID 24) created. Line 44393: [INF]: 22:32:30: SR: Session 58 (user "root", CID 24) created. Line 44394: [INF]: 22:32:30: SR: Session 59 (user "root", CID 24) created. Line 44409: [INF]: 22:32:30: SR: Session 60 (user "root", CID 24) created. Line 44474: [INF]: 22:32:34: SR: Session 62 (user "root", CID 24) created. Line 44498: [INF]: 22:32:34: SR: Session 64 (user "root", CID 24) created. Line 44520: [INF]: 22:32:38: SR: Session 66 (user "root", CID 24) created. Line 44706: [INF]: 22:32:55: SR: Session 68 (user "root", CID 24) created. Line 44711: [INF]: 22:32:55: SR: Session 69 (user "root", CID 24) created. Line 44721: [INF]: 22:32:55: SR: Session 70 (user "root", CID 24) created. Line 44750: [INF]: 22:32:55: SR: Session 73 (user "root", CID 24) created. Line 45126: [INF]: 22:32:55: SR: Session 75 (user "root", CID 24) created. Line 45178: [INF]: 22:32:55: SR: Session 78 (user "root", CID 24) created. Line 45501: [INF]: 22:32:55: SR: Session 80 (user "root", CID 24) created. Line 45565: [INF]: 22:32:55: SR: Session 83 (user "root", CID 24) created. Line 45634: [INF]: 22:32:55: SR: Session 86 (user "root", CID 24) created. Line 45675: [INF]: LN: Call Home client "NC-Client-1" endpoint "default-ssh" connecting... Line 45677: [INF]: LN: Successfully connected to 172.1.1.5:4334 over IPv4. Line 45678: [INF]: SR: Session 89 (user "root", CID 23) created. Line 45681: [INF]: LN: Call Home client "NC-Client-2" endpoint "default-ssh" connecting... Line 45704: [INF]: LN: Call Home client "NC-Client-1" session 1 established. Line 45707: [INF]: SR: Session 90 (user "root", CID 23) created. Line 45716: [INF]: 22:32:56: OR: /orusim/orusw/ietf-netconf-notifications.c:107 username netconf connected Line 45717: [INF]: 22:32:56: OR: /orusim/orusw/ietf-netconf-notifications.c:116 session-id 1 connected Line 45718: [INF]: 22:32:56: OR: /orusim/orusw/ietf-netconf-notifications.c:120 NETCONF Client 172.1.1.5 connected Line 45728: [INF]: 22:32:56: SR: Session 92 (user "root", CID 24) created. Line 46506: [INF]: 22:32:57: SR: Session 142 (user "root", CID 24) created. Line 46514: [INF]: LN: Accepted a connection on 0.0.0.0:830 from 172.1.1.5:33678. Line 46517: [INF]: SR: Session 143 (user "root", CID 23) created. Line 46520: [2023/05/16 22:32:58.129592, 1] ssh_bind_config_parse_line: Unknown option: HostbasedAcceptedKeyTypes, line: 9 Line 46535: [INF]: SR: Session 144 (user "root", CID 23) created. Line 46542: [INF]: 22:32:58: OR: /orusim/orusw/ietf-netconf-notifications.c:107 username netconf connected Line 46543: [INF]: 22:32:58: OR: /orusim/orusw/ietf-netconf-notifications.c:116 session-id 2 connected Line 46544: [INF]: 22:32:58: OR: /orusim/orusw/ietf-netconf-notifications.c:120 NETCONF Client 172.1.1.5 connected Line 46579: [INF]: 22:32:58: SR: Session 148 (user "root", CID 24) created. Line 47309: [INF]: 22:32:59: SR: Session 197 (user "root", CID 24) created. Line 47316: [INF]: LN: Successfully connected to 172.1.1.8:443 over IPv4. Line 47317: [INF]: SR: Session 198 (user "root", CID 23) created. Line 47338: [INF]: LN: Call Home client "NC-Client-2" session 3 established. Line 47341: [INF]: SR: Session 199 (user "root", CID 23) created. Line 47354: [INF]: 22:33:00: OR: /orusim/orusw/ietf-netconf-notifications.c:107 username xranuser connected Line 47355: [INF]: 22:33:00: OR: /orusim/orusw/ietf-netconf-notifications.c:116 session-id 3 connected Line 47356: [INF]: 22:33:00: OR: /orusim/orusw/ietf-netconf-notifications.c:120 NETCONF Client 172.1.1.8 connected Line 48110: [INF]: 22:33:00: SR: Session 252 (user "root", CID 24) created. Line 48113: [INF]: 22:33:00: SR: Session 253 (user "root", CID 24) created. Line 48124: netopeer2-server: /opt/dev/sysrepo/src/shm_sub.c:2766: sr_shmsub_multi_listen_write_event: Assertion `multi_sub_shm->subscriber_count >= valid_subscr_count' failed. Line 48130: [WRN] Connection with CID 23 is dead. Line 48131: [WRN]: 22:33:10: SR: Connection with CID 23 is dead. Line 48132: [WRN] Recovered an event of CID 23 with ID 1. Line 48133: [WRN]: 22:33:10: SR: Recovered an event of CID 23 with ID 1. Line 48142: [INF]: 22:34:00: SR: Session 256 (user "root", CID 24) created. Line 48143: [WRN] Recovering module "o-ran-supervision"notification subscription of CID 23. Line 48144: [WRN]: 22:34:00: SR: Recovering module "o-ran-supervision"notification subscription of CID 23. Line 48284: [INF]: 22:35:30: SR: Session 257 (user "root", CID 24) created. Line 48287: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48290: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48293: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48296: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48299: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48302: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48305: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48308: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48311: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48314: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48317: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48318: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48321: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48324: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48327: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48328: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48331: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48334: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48337: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48338: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48341: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48344: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48347: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48350: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48353: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48356: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48359: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48360: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48363: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48366: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48369: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48370: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48373: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48376: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48379: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48382: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48385: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48386: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48389: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48392: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48395: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48396: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48399: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48402: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48405: [WRN] Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48408: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-server" running change subscription of CID 23. Line 48416: [WRN] Recovering module "ietf-netconf-notifications"notification subscription of CID 23. Line 48419: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-notifications"notification subscription of CID 23. Line 48422: [WRN] Recovering module "ietf-netconf-notifications"notification subscription of CID 23. Line 48423: [WRN]: 22:35:30: SR: Recovering module "ietf-netconf-notifications"notification subscription of CID 23. Line 48482: [INF]: 22:35:30: SR: Session 259 (user "root", CID 24) created. Line 48546: [INF]: 22:35:30: SR: Session 261 (user "root", CID 24) created. Line 48701: [INF]: 22:38:31: SR: Session 263 (user "root", CID 24) created.

Thanks for the help !

michalvasko commented 1 year ago

I would suggest you try the current devel branch of sysrepo, but you will also have to update libyang and netopeer2. There were fixes of some data-races in these notifications so it may have been fixed.