SignalK / set-system-time

Signal K Node server plugin to set system time on time data from GPS
Apache License 2.0
3 stars 6 forks source link

set-system-time ignores chrony #12

Closed sailoog closed 1 year ago

sailoog commented 1 year ago

chrony installed and working OK synchronizing time at boot in Debian bullseye.

set-system-time plugin enabled:

imatge

even though chrony is present and working the plugin is setting time:

imatge

sources:

pi@openplotter:~ $ chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^- 192.92.6.30                   1   6   377    61    +62ms[  +62ms] +/-  196ms
^* ntp61.kashra-server.com       2   6   377    66  +7419us[+6273us] +/-   54ms
^- ntp1.torix.ca                 1   6   377    62  +8854us[+8854us] +/-   78ms
^- ntp68.kashra-server.com       2   6     7    63  +4830us[+4830us] +/-   61ms

chrony log:

pi@openplotter:~ $ journalctl -b -u chrony
-- Journal begins at Mon 2022-07-18 20:04:04 CEST, ends at Sun 2023-03-12 14:40:52 CET. --
Feb 12 14:29:17 openplotter systemd[1]: Starting chrony, an NTP client/server...
Feb 12 14:29:18 openplotter chronyd[624]: chronyd version 4.0 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGN>
Feb 12 14:29:18 openplotter chronyd[624]: Frequency 8.309 +/- 12.134 ppm read from /var/lib/chrony/chrony.drift
Feb 12 14:29:19 openplotter chronyd[624]: Using right/UTC timezone to obtain leap second data
Feb 12 14:29:19 openplotter chronyd[624]: Loaded seccomp filter
Feb 12 14:29:19 openplotter systemd[1]: Started chrony, an NTP client/server.
Feb 12 14:29:53 openplotter chronyd[624]: Selected source 91.236.251.13 (2.debian.pool.ntp.org)
Feb 12 14:29:53 openplotter chronyd[624]: System clock wrong by 20.297028 seconds
Feb 12 14:30:13 openplotter chronyd[624]: System clock was stepped by 20.297028 seconds
Feb 12 14:30:13 openplotter chronyd[624]: System clock TAI offset set to 37 seconds
Feb 12 14:31:19 openplotter chronyd[624]: System clock wrong by 1.699830 seconds
Feb 12 14:31:21 openplotter chronyd[624]: System clock was stepped by 1.699825 seconds
Feb 12 14:32:26 openplotter chronyd[624]: Can't synchronise: no majority
Feb 12 14:32:27 openplotter chronyd[624]: Selected source 51.89.85.70 (2.debian.pool.ntp.org)
Mar 12 14:33:29 openplotter chronyd[624]: Forward time jump detected!
Mar 12 14:33:29 openplotter chronyd[624]: Can't synchronise: no selectable sources
Mar 12 14:35:39 openplotter chronyd[624]: Selected source 51.89.85.70 (2.debian.pool.ntp.org)
Mar 12 14:35:39 openplotter chronyd[624]: System clock wrong by -2419198.880293 seconds
Mar 12 14:35:42 openplotter chronyd[624]: Selected source 206.108.0.131 (2.debian.pool.ntp.org)
Mar 12 14:36:43 openplotter chronyd[624]: Selected source 51.89.85.70 (2.debian.pool.ntp.org)
Mar 12 14:36:45 openplotter chronyd[624]: Source 91.236.251.13 replaced with 51.195.26.117 (2.debian.pool.ntp.org)

This has critical side effects as failing handshakes when updating repositories:

Err:5 https://deb.nodesource.com/node_16.x bullseye InRelease
  Certificate verification failed: The certificate is NOT trusted. The revocation or OCSP data are old and have been superseded.  Could not handshake: Error in the certificate verification. [IP: 212.230.237.170 443]

disabling set-system-time and restarting chrony everything works OK:

pi@openplotter:~ $ journalctl -b -u chrony
-- Journal begins at Sat 2023-02-11 13:24:33 CET, ends at Sun 2023-03-12 14:50:43 CET. --
Feb 12 14:29:17 openplotter systemd[1]: Starting chrony, an NTP client/server...
Feb 12 14:29:18 openplotter chronyd[624]: chronyd version 4.0 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGN>
Feb 12 14:29:18 openplotter chronyd[624]: Frequency 8.309 +/- 12.134 ppm read from /var/lib/chrony/chrony.drift
Feb 12 14:29:19 openplotter chronyd[624]: Using right/UTC timezone to obtain leap second data
Feb 12 14:29:19 openplotter chronyd[624]: Loaded seccomp filter
Feb 12 14:29:19 openplotter systemd[1]: Started chrony, an NTP client/server.
Feb 12 14:29:53 openplotter chronyd[624]: Selected source 91.236.251.13 (2.debian.pool.ntp.org)
Feb 12 14:29:53 openplotter chronyd[624]: System clock wrong by 20.297028 seconds
Feb 12 14:30:13 openplotter chronyd[624]: System clock was stepped by 20.297028 seconds
Feb 12 14:30:13 openplotter chronyd[624]: System clock TAI offset set to 37 seconds
Feb 12 14:31:19 openplotter chronyd[624]: System clock wrong by 1.699830 seconds
Feb 12 14:31:21 openplotter chronyd[624]: System clock was stepped by 1.699825 seconds
Feb 12 14:32:26 openplotter chronyd[624]: Can't synchronise: no majority
Feb 12 14:32:27 openplotter chronyd[624]: Selected source 51.89.85.70 (2.debian.pool.ntp.org)
Mar 12 14:33:29 openplotter chronyd[624]: Forward time jump detected!
Mar 12 14:33:29 openplotter chronyd[624]: Can't synchronise: no selectable sources
Mar 12 14:35:39 openplotter chronyd[624]: Selected source 51.89.85.70 (2.debian.pool.ntp.org)
Mar 12 14:35:39 openplotter chronyd[624]: System clock wrong by -2419198.880293 seconds
Mar 12 14:35:42 openplotter chronyd[624]: Selected source 206.108.0.131 (2.debian.pool.ntp.org)
Mar 12 14:36:43 openplotter chronyd[624]: Selected source 51.89.85.70 (2.debian.pool.ntp.org)
Mar 12 14:36:45 openplotter chronyd[624]: Source 91.236.251.13 replaced with 51.195.26.117 (2.debian.pool.ntp.org)
Mar 12 14:50:43 openplotter chronyd[624]: chronyd exiting
Mar 12 14:50:43 openplotter systemd[1]: Stopping chrony, an NTP client/server...
Mar 12 14:50:43 openplotter systemd[1]: chrony.service: Succeeded.
Mar 12 14:50:43 openplotter systemd[1]: Stopped chrony, an NTP client/server.
Mar 12 14:50:43 openplotter systemd[1]: Starting chrony, an NTP client/server...
Mar 12 14:50:43 openplotter chronyd[7781]: chronyd version 4.0 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIG>
Mar 12 14:50:43 openplotter chronyd[7781]: Frequency 20.494 +/- 35.339 ppm read from /var/lib/chrony/chrony.drift
Mar 12 14:50:43 openplotter chronyd[7781]: Using right/UTC timezone to obtain leap second data
Mar 12 14:50:43 openplotter chronyd[7781]: Loaded seccomp filter
Mar 12 14:50:43 openplotter systemd[1]: Started chrony, an NTP client/server.
Mar 12 14:50:48 openplotter chronyd[7781]: Selected source 162.159.200.123 (2.debian.pool.ntp.org)
Mar 12 14:50:48 openplotter chronyd[7781]: System clock wrong by -2419116.741169 seconds
Feb 12 14:52:12 openplotter chronyd[7781]: System clock was stepped by -2419116.741169 seconds
Feb 12 14:52:12 openplotter chronyd[7781]: System clock TAI offset set to 37 seconds
Feb 12 14:52:13 openplotter chronyd[7781]: Selected source 176.9.157.155 (2.debian.pool.ntp.org)
sailoog commented 1 year ago

This the output of the command set-system-time uses:

pi@openplotter:~ $ chronyc sources 2> /dev/null | cut -c2 | grep -ce '-\|*'
1
tkurki commented 1 year ago

The log lines Can't synchronise: no majority and Can't synchronise: no selectable sources look very suspicious to me. I see no evidence that set-system-time is not working as expected: there is trouble with ntp so let's use GPS time.

The real problem here seems to be that something is producing bad datetime data, time jumps forward to March 12. Note that this would happen with any loss of internet connectivity, so imho that is the real problem to solve, otherwise it makes no sense to keep set-system-time active in the first place.

If this is easy to reproduce I suggest you add logging of what chrony is actually outputting for example with chronyc sources 2> /dev/null | tee -a /tmp/set-system-time.log | cut -c2 | grep -ce '-\|*'. Looking at the command output afterwards tells us nothing, as clearly the grep did not find > 0 lines. Enabling debug logging for set-system-time would also log when time is set.

tkurki commented 1 year ago

We could change set-system-time behavior so that a single zero sources count does not activate it but ntp needs to be unavailable for a longer period.

tkurki commented 1 year ago

Oh I missed the message that set-system-time has set time four times. But there's still something odd going on with the sources: your first chronyc sources output has 4 grep hits, while your second has 1.

sailoog commented 1 year ago

Thanks for the tips. I can reproduce this every time in a fresh installation. I think I have find out why set-system-time is ignoring chrony:

pi@openplotter:~ $ chronyc sources -v

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
 / .- Source state '*' = current best, '+' = combined, '-' = not combined,
| /             'x' = may be in error, '~' = too variable, '?' = unusable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
||      Reachability register (octal) -.           |  xxxx = adjusted offset,
||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
||                                \     |          |  zzzz = estimated error.
||                                 |    |           \
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^? time.cloudflare.com           3   6   377     5   -328ms[+40320m] +/-   29ms
^* sarah.vandalsweb.com          3   6    37     4   -347ms[  -51ms] +/-   95ms
^- 45.66.52.71                   2   6   377     5   -335ms[  -40ms] +/-   66ms
^- 185.179.104.7                 2   6   377     4   -338ms[  -43ms] +/-   86ms
pi@openplotter:~ $ chronyc sources 2> /dev/null | cut -c2 | grep -ce '-\|*'
3
pi@openplotter:~ $ chronyc sources -v

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
 / .- Source state '*' = current best, '+' = combined, '-' = not combined,
| /             'x' = may be in error, '~' = too variable, '?' = unusable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
||      Reachability register (octal) -.           |  xxxx = adjusted offset,
||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
||                                \     |          |  zzzz = estimated error.
||                                 |    |           \
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^? time.cloudflare.com           3   6   377     8  +40320m[+40320m] +/-   53ms
^? sarah.vandalsweb.com          3   6     7     7  +40320m[+40320m] +/-   92ms
^? 45.66.52.71                   2   6   377     8  +40320m[+40320m] +/-   60ms
^? 185.179.104.7                 2   6   377     8  +40320m[+40320m] +/-   78ms
pi@openplotter:~ $ chronyc sources 2> /dev/null | cut -c2 | grep -ce '-\|*'
0

It seems that sources marked as '?' are ignored and that makes sense. Now I have to find out why it fails when set-system-time sets the time. Time coming from GPS is ok but the system time seems to jump forward 2419199.265061 seconds and that is a lot of seconds, aprox 27 days and this is why the repos complain:

E: Release file for http://security.debian.org/debian-security/dists/bullseye-security/InRelease is expired (invalid since 21d 6h 31min 55s). Updates for this repository will not be applied.
E: Release file for http://deb.debian.org/debian/dists/bullseye-updates/InRelease is expired (invalid since 21d 4h 28min 47s). Updates for this repository will not be applied.
E: Release file for http://deb.debian.org/debian/dists/bullseye-backports/InRelease is expired (invalid since 21d 4h 28min 47s). Updates for this repository will not be applied.

I will try more GPSs

FredericGuilbault commented 1 year ago

I don't have an RPI with me ATM, so im guessing things out of my head, but I'm wondering if there would be some kind of race condition between fake-hwclock and chrony that push time forward ... maybe

tkurki commented 1 year ago

@FredericGuilbault check the screenshot, where set-system-time says it has set the time.

From the log it looks like it jumps forward exactly one month. Now that it is February that is 2419200 seconds. I suspect your source just produces navigation.datetime off by one month. Earlier you said Time coming from GPS is ok - was that literally just time or date and time?

sailoog commented 1 year ago

This is really weird. Time and date from GPS are OK and navigation.datetime key is also correct in the server. It is only after set-system-time sets the system time when there is the 1 month jump but the navigation.datetime key keeps the right date/time. I will make more tests this afternoon. Data comes from a Raymarine E80 display by Seatalk1 connection. I will switch to NMEA 0183 and CAN connection and try another GPS.

tkurki commented 1 year ago

Rather log the input and the set tume command, another source may just work and we won’t be any wiser.

sailoog commented 1 year ago

I am getting more and more suspicious of the plugin. This is my navigation.datetime key values before enabling set-system-time plugin:

{
  "meta": {
    "description": "Time and Date from the GNSS Positioning System"
  },
  "value": "2023-02-13T18:01:01.000Z",
  "$source": "seatalk1.ST",
  "timestamp": "2023-02-13T18:01:00.859Z",
  "sentence": "ALK"
}

after enabling the plugin this is the SK log, the entry"[object Object]" is in red:

Feb 13 19:01:03 GET /signalk/v1/api/vessels/self/navigation/datetime 304 6.123 ms - -
Feb 13 19:01:04 GET /signalk/v1/api/vessels/self/navigation/datetime 304 4.916 ms - -
Feb 13 19:01:10 [object Object]
Mar 13 19:01:11 2023-03-13T18:01:11.010Z @signalk/set-system-time System time set to 2023-03-13T18:01:11.000Z 

this is the service log:

Mar 13 19:01:11 openplotter sudo[1391]: pam_unix(sudo:session): session closed for user root
Mar 13 19:01:11 openplotter signalk-server[430]: 2023-03-13T18:01:11.010Z @signalk/set-system-time System time set to 2023-03-13T18:01:11.000Z
Mar 13 19:02:11 openplotter sudo[1457]:       pi : PWD=/home/pi/.signalk ; USER=root ; COMMAND=/usr/bin/date
Mar 13 19:02:11 openplotter sudo[1458]:       pi : PWD=/home/pi/.signalk ; USER=root ; COMMAND=/usr/bin/date --iso-8601 -u -s 2023-03-13T18:02:11.000Z
Mar 13 19:02:11 openplotter sudo[1458]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
Mar 13 19:02:11 openplotter sudo[1457]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)

and this is how the navigation.datetime key looks now:

{
  "meta": {
    "description": "Time and Date from the GNSS Positioning System"
  },
  "value": "2023-02-13T18:02:21.000Z",
  "$source": "seatalk1.ST",
  "timestamp": "2023-03-13T18:02:20.911Z",
  "sentence": "ALK"
}

as you can see the values is still ok. If I use that date/time with date command everything works:

pi@openplotter:~ $ sudo date --iso-8601 -u -s "2023-02-13T18:02:21.000Z"
2023-02-13

It seems that is the plugin who adds a unit to the month. Is it working for you?

sailoog commented 1 year ago

I got it!

Changed to CAN connection and after 8 iterations set-system-time keeps the right date and everything works.

this is the key value:

{
  "meta": {
    "description": "Time and Date from the GNSS Positioning System"
  },
  "value": "2023-02-13T18:56:43.51200Z",
  "$source": "can0.160",
  "timestamp": "2023-02-13T18:56:43.989Z",
  "pgn": 126992
}

as you can see the difference with the seatalk1 source is the amount of decimals in seconds. Any idea how to fix this?

tkurki commented 1 year ago

I sincerely doubt it is up to the number of decimals.

Looking at full data before and after does not work, if seatalk produces always two datagrams in succession and the first one is bad and the second one is good. You’ll just see the second, good one’s data. At least that is one possibility.

There are two st datagrams with datetime, I think the bug is at

https://github.com/SignalK/nmea0183-signalk/blob/6e2602c0570c7121f493a1c6ab9b29f8aa2b29c1/hooks/seatalk/0x54.js#L55

compare to

https://github.com/SignalK/nmea0183-signalk/blob/6e2602c0570c7121f493a1c6ab9b29f8aa2b29c1/hooks/seatalk/0x56.js#L44

but if they always come in succession you never see the off by 1 month alue! But the first one is what triggers this plugin.

sailoog commented 1 year ago

Agree, probably that is the bug not the decimals. I will try later adding -1 to the datagrama 54 but I am afraid this is beyond my knowledge of signal k code. Let me know if you want me to make any test.

tkurki commented 1 year ago

Fix is in https://github.com/SignalK/nmea0183-signalk/releases/tag/v3.13.1

sailoog commented 1 year ago

tested and confirmed! I have just PR a typo correction here: https://github.com/SignalK/set-system-time/pull/13 thanks!