Closed meghna-doshi closed 2 years ago
Hi,
I think we found the issue for replay, looks replay not enabled when we checked with sysrepoctl -l.
Is the following sample fine to enable replay for a yang:
sysrepoctl -c _3gpp-nr-nrm-nrcelldu -r 1
Yep, just check sysrepoctl -l
output afterwards to verify it. By default, replay for all the modules is disabled to not slow things down.
Hi Michal, There is still some issue. We see that "R" option is coming for the yangs which have notification (while checking o/p of sysrepoctl -l)
Following sequence of events:
Connect netconf client (this is not netopeer2-cli, we have some other client )
subscribe as follows:
<?xml version="1.0" encoding="UTF-8"?><rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1647897388768">
<create-subscription xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
<stream>NETCONF</stream>
<filter>
<nrcell-paraminuse-info xmlns="urn:junila:junila_3gpp-nr-nrm-nrcelldu"/>
<nrcell-state-info xmlns="urn:junila:junila_3gpp-nr-nrm-nrcelldu"/>
<ru-delete-notification xmlns="urn:junila:o-ran:ru:1.0"/>
<ru-notification-info xmlns="urn:junila:o-ran:ru:1.0"/>
<ru-sync-notify xmlns="urn:o-ran:agg-sync:1.0"/>
<tx-array-carriers-state-change-info xmlns="urn:o-ran:agg-uplane-conf:1.0"/>
</filter>
<startTime>2022-03-21T20:31:48.954Z</startTime>
</create-subscription>
</rpc>
Notifications are replayed.
Disconnect client
Repeat (1) and (2) this time no notification replayed and following error seen in logs:
2022-03-21T22:15:13.640045885Z netopeer2-server|ERROR "SR" LYB notif: Unexpected notification file EOF.
2022-03-21T22:15:13.640059047Z netopeer2-server|ERROR "SR" Callback "replay_next" of plugin "LYB notif" for module "junila_3gpp-nr-nrm-nrcelldu" failed.
Attaching logs in case that helps. In the logs you will see only twice client connect. First time, notification replay worked but second time it failed and the error logs were seen.
This is seen very consistently on our setups. With 1.0 we never had this issue. It is only coming after upgrading to 2.0. ducm-netconf.zip
We are also observing one more issue, if we give some later (now + x) time in subscribe (--begin 300) then also we are getting all notifications. Is this expected behaviour?
I could not reproduce with the current devel
branches of the projects and netopeer2-cli so I cannot help you. My suggestion would be to update the projects, there could have been a fix.
Hi Michal, Ok we are going to rebase to latest now and try again. While compiling, after rebase to latest, we see following is newly added in netopeer2/CMakeLists.txt
if (NOT SYSREPOCTL_EXECUTABLE)
find_program(SYSREPOCTL_EXECUTABLE sysrepoctl)
endif()
if (NOT SYSREPOCTL_EXECUTABLE)
message(FATAL_ERROR "Unable to find sysrepoctl, set SYSREPOCTL_EXECUTABLE manually.")
endif()
# find sysrepocfg to be used for installation and tests
if (NOT SYSREPOCFG_EXECUTABLE)
find_program(SYSREPOCFG_EXECUTABLE sysrepocfg)
endif()
if (NOT SYSREPOCFG_EXECUTABLE)
message(FATAL_ERROR "Unable to find sysrepocfg, set SYSREPOCFG_EXECUTABLE manually.")
endif()
We have sysrepo installed in non standard paths (by using following options in sysrepo:
-DCMAKE_INCLUDE_PATH="${NETCONF_BASE_BIN_DIR}/${NETCONF_INSTALL_PREFIX}/include/" \
-DCMAKE_LIBRARY_PATH="${NETCONF_BASE_BIN_DIR}/${NETCONF_INSTALL_PREFIX}/lib64/" ..
where NETCONF_BASE_BIN_DIR is some path like /opt/node/..... and NETCONF_INSTALL_PREFIX=/usr/local
So how to give path in cmake so that find_program can lookup in non standard path also. I found one option to use option "PATH_SUFFIXES" but then we have to repeat it for both find_program statement above. Is there some common file to put the path, so that find_program would look in those paths also?
Also by any chance did you have a look at the logs provided in previous comment. Looks this issue is consistently there even though we keep upgrading netopeer2. Need to check if it is really fixed in the latest devel. Will update.
You should be able to specify CMAKE_PROGRAM_PATH
, for example (ref). But you can also directly define SYSREPOCTL_EXECUTABLE
and SYSREPOCFG_EXECUTABLE
with the full paths and they will be used directly.
Also by any chance did you have a look at the logs provided in previous comment.
I did but the output is quite useless on its own. I would need to reproduce it to have any chance of fixing it.
Hi Michal. I did upgrade the netopeer2 and sysrepo modules but still see same issue again. So appears like there is some issue for sure which is still not fixed. Will try to see if i can reproduce and share the steps. Please check the logs once and see if that gives any hint. Logs attached.
Snippet:
2022-03-23T00:36:27.498033822Z netopeer2-server|INFO "SR" Published event "rpc" "/notifications:create-subscription" with ID 2 priority 0 for 1 subscribers.
2022-03-23T00:36:27.498179647Z netopeer2-server|INFO "SR" Processing "/notifications:create-subscription" "rpc" event with ID 2 priority 0 (remaining 1 subscribers).
2022-03-23T00:36:27.499411763Z netopeer2-server|WARNING "SR" Module "ietf-yang-library" does not support notification replay.
2022-03-23T00:36:27.500311944Z netopeer2-server|WARNING "SR" Module "ietf-netconf-notifications" does not support notification replay.
2022-03-23T00:36:27.501265206Z netopeer2-server|WARNING "SR" Module "nc-notifications" does not support notification replay.
2022-03-23T00:36:27.502223793Z netopeer2-server|WARNING "SR" Module "ietf-keystore" does not support notification replay.
2022-03-23T00:36:27.503188449Z netopeer2-server|WARNING "SR" Module "ietf-truststore" does not support notification replay.
2022-03-23T00:36:27.504180106Z netopeer2-server|WARNING "SR" Module "ietf-network-instance" does not support notification replay.
2022-03-23T00:36:27.505195372Z netopeer2-server|WARNING "SR" Module "ietf-subscribed-notifications" does not support notification replay.
2022-03-23T00:36:27.506228720Z netopeer2-server|WARNING "SR" Module "ietf-yang-push" does not support notification replay.
2022-03-23T00:36:27.507366293Z netopeer2-server|ERROR "SR" LYB notif: Unexpected notification file EOF.
2022-03-23T00:36:27.507381150Z netopeer2-server|ERROR "SR" Callback "replay_next" of plugin "LYB notif" for module "junila_3gpp-nr-nrm-nrcelldu" failed.
2022-03-23T00:36:27.543042579Z netopeer2-server|INFO "SR" Successful processing of "rpc" event with ID 2 priority 0 (remaining 0 subscribers).
sysrepoctl --version sysrepoctl - sysrepo YANG schema manipulation tool, compiled with libsysrepo v2.1.44 (SO v7.3.7)
netopeer2-server -V netopeer2-server 2.1.16
Hi Michal, One more observation we have.
I am getting tired of examining obscure issues only to learn they are caused by some container software. In all likelihood the notification file is getting removed or something like that but you are on your own with fixing that.
Hi Michal, My bad, i dont think anything to do with container. But there is one difference which we see from testing with netopeer2-client and with our proprietary netconf client (with which issue is seen)
<?xml version="1.0" encoding="UTF-8"?><notification xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
<eventTime>2022-03-25T03:58:13.351406264+00:00</eventTime>
<nrcell-state-info xmlns="urn:junila:junila_3gpp-nr-nrm-nrcelldu">
<nrcell-info>
<cell-id>cellindex_1</cell-id>
<cellState>INACTIVE</cellState>
<operationalState>ENABLED</operationalState>
</nrcell-info>
</nrcell-state-info>
</notification>
<?xml version="1.0" encoding="UTF-8"?><rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="1648177503290">
<create-subscription xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
<stream>NETCONF</stream>
<filter>
<nrcell-paraminuse-info xmlns="urn:junila:junila_3gpp-nr-nrm-nrcelldu"/>
<nrcell-state-info xmlns="urn:junila:junila_3gpp-nr-nrm-nrcelldu"/>
<tx-array-carriers-state-change-info xmlns="urn:o-ran:agg-uplane-conf:1.0"/>
</filter>
<startTime>2022-03-25T03:58:13.352Z</startTime>
</create-subscription>
</rpc>
Hi Michal, I think my above observation is correct. I am now able to reproduce issue, though now it just goes for segmentation fault!! I tried using user-rpc command at netopeer2-cli and found segmentation fault in netopeer2-server.
Try the following:
user-rpc --content rpc.xml
where rpc.xml is having content as follows:
<create-subscription xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
<stream>NETCONF</stream>
<filter>
<nrcell-paraminuse-info xmlns="urn:junila:junila_3gpp-nr-nrm-nrcelldu"/>
<nrcell-state-info xmlns="urn:junila:junila_3gpp-nr-nrm-nrcelldu"/>
<tx-array-carriers-state-change-info xmlns="urn:o-ran:agg-uplane-conf:1.0"/>
</filter>
<startTime>2022-03-25T09:45:31.352Z</startTime>
</create-subscription>
In 1.0, both sysrepo and our proprietary netconf client were using time in UTC format, so this issue was never seen. Can you please try this and then suggest some fix for the same.
Hi Michal, JFI, below is netopeer2-server bt when crashed with above user-rpc trial with UTC time: I think by now you would have reproduced it, just pasting the bt in case it is useful.
(gdb) bt
#0 0x00007f2256997da6 in __strcmp_sse42 () from /usr/lib64/libc.so.6
#1 0x0000000000409b91 in np2srv_err_reply_sr (err_info=0x7f223c000cc0)
at /home/md1085/git/mdoshi_onecell_integration_6.5/ran_3rdparty/open_source/netconf/netopeer2/src/main.c:286
#2 0x0000000000409f58 in np2srv_rpc_cb (rpc=0x7f223c0058d0, ncs=0x7f223c000ff0)
at /home/md1085/git/mdoshi_onecell_integration_6.5/ran_3rdparty/open_source/netconf/netopeer2/src/main.c:377
#3 0x00007f22579476ef in nc_ps_poll ()
from /home/md1085/git/mdoshi_onecell_integration_6.5/ran_3rdparty/open_source/netconf/bin/du/usr/local/lib64/libnetconf2.so.3.1.3
#4 0x000000000040c5de in worker_thread (arg=0x1dd7cf0) at /home/md1085/git/mdoshi_onecell_integration_6.5/ran_3rdparty/open_source/netconf/netopeer2/src/main.c:1029
#5 0x00007f22571fdea5 in start_thread () from /usr/lib64/libpthread.so.0
#6 0x00007f22569578dd in clone () from /usr/lib64/libc.so.6
(gdb)
Hi Michal, Were you able to reproduce as i mentioned above? Any tentative time when we can expect this fix?
I have tested the RPC you pasted but it worked fine for me. The problem will likely be caused by those invalid stored notifications, which are attempted to be parsed and which I do not have. I would need a way to reproduce storing those notifications, starting with no stored notifications.
Hi Michal,
Not sure why you are not able to see the crash. Here are the steps which i did:
install the yangs (the yang folder attached) (where $SYSREPOCTL = sysrepoctl)
$SYSREPOCTL -i $MODDIR/junila_3gpp-nr-nrm-nrcelldu.yang -I $TMPFILE -s $MODDIR -v4
$SYSREPOCTL -i $MODDIR/_3gpp-common-managed-element.yang -s $MODDIR -e MeasurementsUnderManagedElement -v4
$SYSREPOCTL -i $MODDIR/_3gpp-common-subnetwork.yang -s $MODDIR -v4
$SYSREPOCTL -i $MODDIR/_3gpp-nr-nrm-nrcelldu.yang -s $MODDIR -v4
$SYSREPOCTL -i $MODDIR/_3gpp-nr-nrm-nrsectorcarrier.yang -s $MODDIR -v4
$SYSREPOCTL -i $MODDIR/_3gpp-nr-nrm-bwp.yang -s $MODDIR -v4
$SYSREPOCTL -i $MODDIR/_3gpp-nr-nrm-ep.yang -s $MODDIR -v4
$SYSREPOCTL -i $MODDIR/_3gpp-nr-nrm-commonbeamformingfunction.yang -s $MODDIR -v4
$SYSREPOCTL -i $MODDIR/o-ran-heartbeat-management.yang -s $MODDIR -v4
$SYSREPOCTL -i $MODDIR/ietf-hardware.yang -s $MODDIR -e hardware-state -v4
$SYSREPOCTL -i $MODDIR/_3gpp-common-ep-rp.yang -s $MODDIR -v4
The contents of $TMPFILE above, is such that following entry shows up in startup datastore:
sysrepocfg -X -x "/_3gpp-common-managed-element:ManagedElement/*"
<ManagedElement xmlns="urn:3gpp:sa5:_3gpp-common-managed-element" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" nc:operation="create">
<id>1</id>
<attributes>
<dnPrefix>ProductClass=CS_ONECELL_DU,OUI=0005B9,VnfId=cs-gnb-1-du-1</dnPrefix>
<userLabel>DU_ME</userLabel>
<userDefinedState>INIT</userDefinedState>
</attributes>
</ManagedElement>
enable notification replay:
$SYSREPOCTL -c junila_3gpp-nr-nrm-nrcelldu -r 1
Run netopeer2-server:
netopeer2-server -d -v2
Run sample application (this step is must, if we dont run the sample app, then the issue is not seen):
application_changes_example _3gpp-common-managed-element
Now run netopeer2-cli, even without generating any notification, just do the following on netopeer2-cli:
<create-subscription xmlns="urn:ietf:params:xml:ns:netconf:notification:1.0">
<stream>NETCONF</stream>
<filter>
<nrcell-paraminuse-info xmlns="urn:junila:junila_3gpp-nr-nrm-nrcelldu"/>
<nrcell-state-info xmlns="urn:junila:junila_3gpp-nr-nrm-nrcelldu"/>
</filter>
<startTime>2022-03-30T05:00:00.000Z</startTime>
</create-subscription>
where date command on the server returned following:
[root@localhost bin]# date
Wed Mar 30 05:01:35 IST 2022
netopeer2-server will crash.
Also if there is any other proper way, then please try to subscribe via client using startTime in UTC format. Surely you will find lots of issues.
Nope, I tried all these steps and have successfully subscribed to the notifications. Please fully uninstall all the libraries (make uninstall
, for sysrepo make uninstall_with_repo
), make sure there are no old versions on your system (run # ldconfig
and then # ldconfig -p | grep libyang
and so on for all the libraries), install it all again and try to reproduce the problem while writing down the steps. Otherwise I cannot help you.
Nope, I tried all these steps and have successfully subscribed to the notifications. Please fully uninstall all the libraries (
make uninstall
, for sysrepomake uninstall_with_repo
), make sure there are no old versions on your system (run# ldconfig
and then# ldconfig -p | grep libyang
and so on for all the libraries), install it all again and try to reproduce the problem while writing down the steps. Otherwise I cannot help you.
Did you try subscribing with starttime in UTC format (ex
So can you please try some way that the subscribe command on netopeer2-client is sending startTime in UTC format.
I did try it and have seen no issue whatsoever so like I keep repeating, I need an exact non-working use-case.
I did try it and have seen no issue whatsoever so like I keep repeating, I need an exact non-working use-case.
So just to understand clearly, you made changes in netopeer2-client to send the time given in "subscribe" call in UTC format? I think following is pre condition to reproduce, even if you are sending time in UTC:
ensure that the server on which netconf-server is running is having timezone 00:00 (ex eventTime reported in notifications is like:
<eventTime>2022-03-31T19:37:52.283357641+00:00</eventTime>
ensure that the client subscribes using UTC and since timezone above is 00, UTC time would also be near to localTime ex:
<startTime>2022-03-31T19:37:52.284Z</startTime>
Steps:
The reason i am not able to give you proper steps is that our netconf client is not same as the netopeer2-client. So if you can tell me what code to change to make the netopeer2-client sent subscription sttartTime in UTC format, may be then i can share exact steps.
Else try the steps what i sent above.
Note: From the logs,
2022-03-31T19:40:52.125026777Z netopeer2-server|ERROR "SR" LYB notif: Unexpected notification file EOF.
2022-03-31T19:40:52.125040664Z netopeer2-server|ERROR "SR" Callback "replay_next" of plugin "LYB notif" for module "junila_3gpp-nr-nrm-nrcelldu" failed.
It appears like some file operation is happening when subscription is coming from the netconf client. So may be there is something being done with the startTime attribute sent by the client. So may be you can check code around that area please.
This is so consistently happening on our setup that we are totally blocked due to this issue.
Hi Michal, Also from the code appears like following piece of code is returning error:
srpntf_lyb_replay_next (...)
...
do {
if ((rc = srpntf_read_ts(st->fd, notif_ts))) {
goto cleanup;
}
if (!notif_ts->tv_sec) {
SRPLG_LOG_ERR(srpntf_name, "Unexpected notification file EOF.");
rc = SR_ERR_INTERNAL;
goto cleanup;
}
If you could tell what is not proper, then i can have a look. Not sure why notif_ts->tv_sec is coming empty. Not able to figure out clearly how the value notif_ts->tv_sec is being set by reading the notif files.
Start netopeer2-server with -d -c MSG
and then attach the whole output (will probably be a lot) here up until the error. Then it will not matter what client you are using.
Hi Michal, Please find attached logs taken with netopeer2-server running with -d -c MSG option. Using that option, i was not getting sysrepo error logs so for now i added the following highlighted line in netopeer2 main.c:
else if (!strcmp(ptr, "MSG")) { / NETCONF messages - only lnc2 debug verbosity / np2_verbose_level = NC_VERB_DEBUG; np2_sr_verbose_level = SR_LL_DBG;
Following sequence was captured in the logs:
You have not done what I asked, the messages are from syslog and are a mess. Moreover, setting sysrepo log level to DBG caused around 3/4 of the log to be redundant. So, update the server (to devel
branch, to get INF sysrepo messages) and please start the server with # netopeer2-server -d -c MSG &> out.txt
. Then attach out.txt
.
Hi Michal, I cannot update the netopeer2-server version as of now as otherwise i will have to do a lot of testing for the project as per our standard procedure.
For now i will continue with netopeer2-server 2.1.10, let me know if that is fine.
So now my query is did the above logs didnt help? I mean even though there are redundant logs, didnt you get the desired logs also to debug the issue? As i am not sure, that if i reduce sysrepo level to INFO, how will that help (except of course reducing some unwanted logs).
Hi Michal,
Attached logs taken by taking the pointed fix in netopeer2-server main.c of adding following (https://github.com/CESNET/netopeer2/commit/7534fc66b2346f8c1a1f38f1b6a6b90b7ca1c594)
np2_sr_verbose_level = SR_LL_INF;
Please check.
This log was much better, all the information is there. However, I could still find no reason for the error, looking at the code. Please, try the attached patch for sysrepo and provide the log (just with -v2
will be enough) until the error is reproduced.
Hi Michal, Please find attached logs with the above sysrepo patch. cm-sysrepopatch.zip
Right, please try the attached patch (and you can restore the changes form the previous patch), it could fix the problem.
Hi Michal, This patch seems to be working fine. Thanks for the quick patch. Will test a little more in detail and update. Have you delivered this to the devel branch also?
Um, I see that I have although it was not intentional and I wanted you to test it first. No matter then, it is there.
Hi,
Notification replay feature is not working. Even notification is not sent to client if we subscribe with "begin time", this used to work fine in 1.x releases.
sysrepoctl - sysrepo YANG schema manipulation tool, compiled with libsysrepo v2.1.15 (SO v7.1.3) netopeer2-server 2.1.10
Snippet of netopeer2-client:
Normal subscribe done without replay
Notification triggered from application, it reached netopeer2-client:
disconnected the session
>disconnect
Reconnect
> connect --ssh --host 10.211.111.113 --port 830 --login root
subscribe again, this time with replay. No notification sent (expected was to see the notification sent in (2) ), In fact any notification which we trigger from application is not seen on the client if we subscribe with "--begin" option.
Also earlier in 1.x versions, we used to find notification related files getting created in /etc/sysrepo/data/notif/ folder but now in 2.x versions, we dont see that folder at all. Is there change in the notif folder path?