angeloc / htpdate

The (new) home of HTTP Time Protocol (HTP)
Other
38 stars 5 forks source link

htpdate isn't setting STA_UNSYNC ntp kernel flag #16

Open Grabber opened 3 years ago

Grabber commented 3 years ago

t.c

#include <stdio.h>
#include <sys/timex.h>

int main(int argc, char *argv[]) {
   struct timex txc = {};
   int ret = adjtimex(&txc);

   fprintf(stdout, "adjtimex, ret=%d, STA_UNSYNC=%d, TIME_OK=%d, TIME_ERROR=%d\n", ret, (txc.status & STA_UNSYNC),
                                                                                        (ret==TIME_OK),
                                                                                        (ret==TIME_ERROR));

   return 0;
}
sudo ./htpdate -t -a -d <URL> -s -x
gcc t.c -o t; ./t
adjtimex, ret=5, STA_UNSYNC=64, TIME_OK=0, TIME_ERROR=1

http://linuxelf.com/blog/2017/03/27/clock-status-unsync/

angeloc commented 2 years ago

Any proposal to fix it?

Grabber commented 2 years ago

Any proposal to fix it?

It is easy to unset STA_UNSYNC flag by calling adtimex with MOD_STATUS and status &= ~TimexStruct.STA_UNSYNC but somehow the kernel seems to reset back the flag after 1s. I also tried to unset the STA_UNSYNC flag and adjust the frequency (MOD_FREQUENCY) and MOD_ESTERROR and MOD_MAXERROR at once but the reset behaviour is the same.

Scenario-3) Every second, when the nanosecond count overflows 1 billion, the second_overflow function increases time_maxerror by 500units. This represents the maximum amount of error that could possibly be present on a reasonable clock. If ntpd is running, it will constantly battle this growing max error by speeding or slowing the clock and lowering the maxerror counter. If it isn’t working, maxerror will eventually max out at 16 million and the second_overflow will freeze it at 16million and set STA_UNSYNC.

I'm not running any NTP client (like ntpd or chorny), maybe the nanosecond count is overflowing, but I don't know how to handle it, nor can understand what is going on.

The key point on seting STA_UNSYNC correctly, in my case, is because I use it both in C and Python applications to check the time synchronization state to associate it with some data points that were collected at an IoT device. The point of synchronizing the time via HTTP is because sometimes 123/UDP is blocked by the ISP provider and in this situation the best available time reference is via HTTP.

twekkel commented 2 years ago

Hi Luiz,

Good to see that 2 years after you asked me for a solution, you didn't give up :).

I will repeat my statement which I gave at that time, htpdate doesn't touch the STA_UNSYNC flag ... and it shouldn't do that! htpdate is far to inaccurate to ever say it is in sync... that is really something only NTP can do/say... htpdate not.

After having said that... below snippet puts the system in STA_UNSYNC=0 for me... no reset (after 1s or 1m) what so ever... I guess there is some other daemon on your system touching the status.

Regards, Eddy

#include <stdio.h>
#include <sys/timex.h>

int main(int argc, char *argv[]) {
   struct timex txc = {};

   txc.modes = MOD_STATUS | MOD_ESTERROR | MOD_MAXERROR;

   // Force to be synchronised
   txc.status &= ~STA_UNSYNC;

   // Force to be UNsynchronised
   //txc.status |= STA_UNSYNC;

   int ret = adjtimex(&txc);

   fprintf(stdout, "adjtimex, ret=%d, STA_UNSYNC=%d, TIME_OK=%d, TIME_ERROR=%d\n", ret, (txc.status & STA_UNSYNC),
                                                                                        (ret==TIME_OK),
                                                                                        (ret==TIME_ERROR));

   return 0;
}
twekkel commented 2 years ago

Is systemd-timesyncd running maybe? If so, try stopping it with

systemctl stop systemd-timesyncd

Grabber commented 2 years ago

Hi @twekkel

I was investigating it other day again, never ever givup!

Is systemd-timesyncd running maybe? If so, try stopping it with

systemctl stop systemd-timesyncd

Yes, both systemd-timesyncd and chorny are disabled. I forced both with systemctl disable

x@x-070b7508:~$ sudo systemctl status systemd-timesyncd
● systemd-timesyncd.service - Network Time Synchronization
   Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; disabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/systemd-timesyncd.service.d
           └─disable-with-time-daemon.conf
   Active: inactive (dead)
     Docs: man:systemd-timesyncd.service(8)
x@x-070b7508:~$ systemctl status chrony
● chrony.service - LSB: Controls chronyd NTP time daemon
   Loaded: loaded (/etc/init.d/chrony; bad; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:systemd-sysv-generator(8)

Before running your code snippet:

x@x-070b7508:~$ timedatectl
      Local time: Thu 2021-12-09 16:50:01 -03
  Universal time: Thu 2021-12-09 19:50:01 UTC
        RTC time: Fri 1970-01-02 14:38:42
       Time zone: America/Sao_Paulo (-03, -0300)
 Network time on: no
NTP synchronized: no
 RTC in local TZ: no

@ntp.c

#include <stdio.h>
#include <sys/timex.h>

int main(int argc, char *argv[]) {
   struct timex txc = {};

   txc.modes = MOD_STATUS | MOD_ESTERROR | MOD_MAXERROR;

   // Force to be synchronised
   txc.status &= ~STA_UNSYNC;

   // Force to be UNsynchronised
   //txc.status |= STA_UNSYNC;

   int ret = adjtimex(&txc);

   fprintf(stdout, "adjtimex, ret=%d, STA_UNSYNC=%d, TIME_OK=%d, TIME_ERROR=%d\n", ret, (txc.status & STA_UNSYNC),
                                                                                        (ret==TIME_OK),
                                                                                        (ret==TIME_ERROR));

   return 0;
}
gcc ntp.c -o ntp
./ntp
adjtimex, ret=-1, STA_UNSYNC=0, TIME_OK=0, TIME_ERROR=0

After running your code snippet:

x@x-070b7508:~$ timedatectl
      Local time: Thu 2021-12-09 16:51:09 -03
  Universal time: Thu 2021-12-09 19:51:09 UTC
        RTC time: Fri 1970-01-02 14:39:50
       Time zone: America/Sao_Paulo (-03, -0300)
 Network time on: no
NTP synchronized: no
 RTC in local TZ: no

Both ./ntp and timedatectl in a row:

x@x-070b7508:~$ ./ntp; timedatectl
adjtimex, ret=-1, STA_UNSYNC=0, TIME_OK=0, TIME_ERROR=0
      Local time: Thu 2021-12-09 16:53:57 -03
  Universal time: Thu 2021-12-09 19:53:57 UTC
        RTC time: Fri 1970-01-02 14:42:39
       Time zone: America/Sao_Paulo (-03, -0300)
 Network time on: no
NTP synchronized: no
 RTC in local TZ: no
twekkel commented 2 years ago

@Grabber Great so it works for you too, I guess. If sleeptime in htpdate.c is > 3600s or so you could potentially call this code to set the STA_UNSYNC and ERROR to be 0 every minute or so, to keep the kernel to believe it is still in sync.

Grabber commented 2 years ago

@twekkel, no it isn't working:

NTP synchronized: no
x@x-070b7508:~$ uname -a
Linux x-070b7508 3.4.113-sun8i #2 SMP PREEMPT Tue Dec 26 16:01:30 PST 2017 armv7l armv7l armv7l GNU/Linux

In which kernel version have run tried it?

twekkel commented 2 years ago

@Grabber Tried my snippet on a intel based laptop with kernel 5.4 and also on a raspberry pi with 5.4 ... stock ubuntu kernels

What is the output of adjtimex -p, before and after you run the snippet? I suspect the 'maxerror' is increasing too fast on your system... but it is guessing. Eventually maxerror will get the value '16000000' and that means, no sync anymore and status will be 64.

Grabber commented 2 years ago

@twekkel, I forgot to call adjtimex() as root on later tests... it seems to be working BUT I still don't know why in the past tests the STA_UNSYNC was resetting state. I will try it on multiple devices, over multiple days, with and without reboots at random timepoints to double check!

Grabber commented 2 years ago

@twekkel, with random reboots during the day, the STA_UNSYNC is resetting again after a successful set.

successful set

Dec 15 20:52:44 x-070b7508 systemd[1]: Started clockstate.
Dec 17 01:01:15 x-070b7508 python3[1309]: now=1639612366198931, synchronized=False
Dec 17 01:01:15 x-070b7508 python3[1309]: new clockstate: [('htp_drift_sec', 101309), ('htp_date_str', '2021-12-17 04:01:15'), ('rtt_usec', 383873), ('htp_date', (2021, 12, 17, 4, 1, 15, 0, 1, -1))]
Dec 17 01:01:15 x-070b7508 python3[1309]: sync=True, synchronized=True
Dec 17 01:01:15 x-070b7508 python3[1309]: now=1639713675000866, synchronized=True

flag is unset after a while

x@x-070b7508:~$ sudo timedatectl
      Local time: Fri 2021-12-17 14:01:43 -03
  Universal time: Fri 2021-12-17 17:01:43 UTC
        RTC time: Sun 1970-01-04 00:36:01
       Time zone: America/Sao_Paulo (-03, -0300)
 Network time on: no
NTP synchronized: no
 RTC in local TZ: no

next thing to investigate: if the fake-hwclock is the bad guy.

twekkel commented 2 years ago

It is supposed to work like this. The 'sync status' goes automatically to unsynchronized after a while. Ntpd/chronyd are constantly updating the kernel/clock. If there is no daemon to do that, the status goes to 64.

Run this:

adjtimex -m 0 && adjtimex -e 0 && adjtimex -S 0 && adjtimex -p && sleep 10 && adjtimex -p

You will see that after initially setting maxerror (and others) to 0, it has increased to about 5000, 10 seconds later.... wait long enough (about 8 hours) and the maximum (16000000) is reached, hence the clock status is 64/unsync.

Please read this: http://www.ntp.org/ntpfaq/NTP-s-algo-kernel.htm#Q-ALGO-KERNEL-MON-VALS

twekkel commented 2 years ago

@Grabber

adjtimex -m 0 && adjtimex -e 0 && adjtimex -S 0

is exactly what the snippet (https://github.com/angeloc/htpdate/issues/16#issuecomment-981165678) does

twekkel commented 2 years ago

@Grabber please give this (upstream) branch a try https://github.com/twekkel/htpdate/tree/STA_UNSYNC

the '-y' option will set the clock to be synchronized and will keep it synchronized as long as htpdate is running.... it needs some work. Not sure if it will go into master either....

let me know if it works for you

Grabber commented 2 years ago

@twekkel, yes it working!

Do you need any help to improve it? I read a comment saying "needs improvements RFC 5905".

About not merging to master... I know your point about the HTTP reference not being so accurate as the NTP protocol... but my point is: for the cases where the NTP is not an option it's better to trust the HTTP instead of having nothing. My suggestion is to merge it to the master and keep the syncclock flag off by default.