STORDIS / monsoon

SONiC monitoring system supplies SONiC telemetry data.
Apache License 2.0
35 stars 10 forks source link

Support NTP information in sonic-exporter as timex syscall is not working on Sonic OS. #24

Open Cellebyte opened 2 years ago

Cellebyte commented 2 years ago
# HELP node_timex_sync_status Is clock synchronized to a reliable server (1 = yes, 0 = no).
# TYPE node_timex_sync_status gauge
node_timex_sync_status 0

Whereas show ntp shows a valid and working output.

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+10.0.0.13    .DCFa.           1 u   64   64  377   14.485   -3.121   5.989
+10.0.0.29    .DCFa.           1 u   61   64  377   15.584    6.268   5.309
*10.0.0.45    .MBGh.           1 u   53   64  377    5.021    2.428   4.931
-10.0.0.61    .DCFa.           1 u   52   64  377   14.356    1.643   4.899
+10.0.0.77    .DCFa.           1 u   58   64  377   17.212   -2.078   5.106
synchronised to NTP server (10.0.0.45) at stratum 2 
   time correct to within 12 ms
   polling server every 64 s

Conclusion is we need to export the ntp information of a switch with monsoon.

kamalkrbh commented 2 years ago

@Cellebyte , I just pushed changes for NTP metrics, You can check NTP metrics on Live demo here just select target on the top to be 10.10.131.111:9101.

Please note while starting sonic-exporter following volumes need to be mounted like "-v /usr/bin/ntpq:/usr/bin/ntpq -v /usr/lib/x86_64-linux-gnu/:/usr/lib/x86_64-linux-gnu/ " readme.md is updated accordingly.

If OK please close the issue.

Cellebyte commented 2 years ago

I still need to roll it out to verify it is working.

Cellebyte commented 1 year ago

@kamalkrbh I tried it and we are running ntp inside the mgmt VRF. When you do this it does not report the correct response.

jmessenger51 commented 1 year ago

We have a similar issue where NTP in Monsoon & Node-exporter say the switch is out of sync despite the CLI showing that NTP is in sync. I have a stordis support ticket 507

kamalkrbh commented 1 year ago

@jmessenger51 Please check my commit https://github.com/STORDIS/monsoon/commit/1aad229e0122e245ffcff5ff231d20436ef3ab2e

Actually vrf hash is not being added to key "NTP|global" in the redis DB, instead I have changed it to check field "mgmtVrfEnabled" under key "MGMT_VRF_CONFIG|vrf_global" which always turns to true or false whenever mgmt vrf is added or deleted. Also in the commit have a look at readme file, start command for docker has been updated to mount /usr/bin/cgexec volume, which is required for mgmt_vrf case.

Please can you check with the latest changes.

jmessenger51 commented 1 year ago

I pulled the nightly build and used the updated docker start command, however, I'm not seeing any changes to how NTP is shown. Is the nightly docker image the correct image to test?

kamalkrbh commented 1 year ago

@jmessenger51 , A few things to verify - On the switch execute following command, and check output -

admin@sonic:~$ ntpq -p -n
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 10.10.131.254   .INIT.          16 u    - 1024    0    0.000    0.000   0.000

Also check that DB has NTP server information -

admin@sonic:~$ redis-cli 
127.0.0.1:6379> SELECT 4
OK
127.0.0.1:6379[4]> HGETALL NTP|global
1) "src_intf@"
2) "eth0"
3) "auth_enabled"
4) "false"
127.0.0.1:6379[4]>

Now check if sonic-exporter is returning those metrics, on the browser hit the url http://:9101/metrics Search for the string 'ntp' in the browser, you should see metrics with values -

# HELP sonic_ntp_peers NTP peers
# TYPE sonic_ntp_peers gauge
sonic_ntp_peers{poll="1024",reach="0",refid=".INIT.",remote="10.10.131.254",st="16",state=" ",t="u"} 1.0
# HELP sonic_ntp_sync_status SONiC NTP Sync Status (0/1 0==Not in Sync 1==Sync)
# TYPE sonic_ntp_sync_status gauge
sonic_ntp_sync_status 0.0
# HELP sonic_ntp_when Time (in seconds) since an NTP packet update was received
# TYPE sonic_ntp_when gauge
sonic_ntp_when{refid=".INIT.",remote="10.10.131.254"} 0.0
# HELP sonic_ntp_rtd Round-trip delay (in milliseconds) to the NTP server.
# TYPE sonic_ntp_rtd gauge
sonic_ntp_rtd{refid=".INIT.",remote="10.10.131.254"} 0.0
# HELP sonic_ntp_offset Time difference (in milliseconds) between the switch and the NTP server or another NTP peer.
# TYPE sonic_ntp_offset gauge
sonic_ntp_offset{refid=".INIT.",remote="10.10.131.254"} 0.0
# HELP sonic_ntp_jitter Mean deviation in times between the switch and the NTP server
# TYPE sonic_ntp_jitter gauge
sonic_ntp_jitter{refid=".INIT.",remote="10.10.131.254"} 0.0
# HELP sonic_ntp_global NTP Global
# TYPE sonic_ntp_global gauge
sonic_ntp_global{auth_enabled="false",src_intf="eth0",trusted_key="",vrf=""} 1.0
# HELP sonic_ntp_server NTP Servers
# TYPE sonic_ntp_server gauge

With the above data followign the NTP panel displayed : image

Let us know further !

jmessenger51 commented 10 months ago

Upgraded to broadcom sonic 4.1.2 - enterprise base:

admin@c01-33-borderleaf:~$ sonic-cli c01-33-borderleaf.a.clt1.theark.cloud# show ntp associations remote refid st t when poll reach delay offset jitter

+10.8.48.2 10.8.48.53 4 u 831 1024 377 0.518 0.402 0.478 -10.8.48.3 10.8.48.53 4 u 117 1024 377 0.401 2.152 0.706 *10.8.48.4 198.137.202.32 3 u 192 1024 377 0.642 0.799 0.469 +10.8.48.123 10.8.48.53 4 u 294 1024 377 0.533 0.612 0.655

Here is the output from monsoon: image