We are encountering a sporadic issue in some internal tests where netopeer2-server crashed and we can see an error trace about subscriber_count (see traces below)
In the attached traces, we are in a scenario where we have configured 2 callhome clients, and we have a third session which is established with a "connect" on netopeer2's listening port.
Then at some point, we perform a restart of the stack (see "oranrestart") which stop the various services, clean the shm files from sysrepo, before restarting the services.
It seems the error occurs when the second callhome client session is established and we subscribe to notification on this one.
What could cause such an error ? Maybe an incorrect cleanup from a prior restart ?
Given the code, it would mean we have more subscribers that processed the event than we have registered subscribers, right ? How is it possible ?
Do we have specific traces that we can check for that specific issue ?
I would suggest you try the current devel branch of sysrepo, but you will also have to update libyang and netopeer2. There were fixes of some data-races in these notifications so it may have been fixed.
Hi !
Description
We are encountering a sporadic issue in some internal tests where netopeer2-server crashed and we can see an error trace about subscriber_count (see traces below) In the attached traces, we are in a scenario where we have configured 2 callhome clients, and we have a third session which is established with a "connect" on netopeer2's listening port. Then at some point, we perform a restart of the stack (see "oranrestart") which stop the various services, clean the shm files from sysrepo, before restarting the services.
It seems the error occurs when the second callhome client session is established and we subscribe to notification on this one.
What could cause such an error ? Maybe an incorrect cleanup from a prior restart ? Given the code, it would mean we have more subscribers that processed the event than we have registered subscribers, right ? How is it possible ? Do we have specific traces that we can check for that specific issue ?
Versions
SYSREPO = sysrepo-2.2.36 NETCONF2 = libnetconf2-2.1.28 LIBYANG = libyang-2.1.30 LIBYANGCPP = libyang-cpp-1.1.0 SYSREPOCPP = sysrepo-cpp-v1.0.0-23 PYSYSREPO = sysrepo-python-1.4.0 PYYANG = libyang-python-2.6.0
Traces
Crashtrace:![image](https://github.com/CESNET/netopeer2/assets/8125922/2f186c08-1319-40bf-817d-4897a325f40a)
(Snippet from the full trace )
Thanks for the help !