FS#260 - ntp server doesn't work - no reply to clients

aparcar commented 8 years ago

juanriccio:

Lede r1735. I have ntp client and server enabled in LuCi, and my /etc/config/system file has it enabled too, as expected. config timeserver 'ntp' list server 'openwrt.pool.ntp.org' list server 'time.nist.gov' option enable_server '1'

The process is indeed active: ~# ps | grep ntp 2652 root 1176 S /usr/sbin/ntpd -n -N -l -S /usr/sbin/ntpd-hotplug -p 4226 root 1176 S grep ntp However, no device in my LAN is able to get the time from this server. Such devices include:

several IP security cams
a security NVR
several desktop computers, most running windows. Suspecting something broken in the Windows default NTP client, I also downloaded an external NTP client utility for testing (NetTime at www.timesynctool.com). That doesn't work, either.

The only possibility I see left is that there is some problem in the Lede system.

aparcar commented 8 years ago

jow-:

What is your router model / architecture? A quick test on an Alix APU running an x86/64 build reveals no issues, an "ntpdate -q 192.168.1.1" from a wifi client yields a proper response so it is possibly a target specific problem.

aparcar commented 8 years ago

mkresin:

Beside what jow said, please try again with a more recent LEDE snapshot. r1735 is almost 30 days old and if there was a bug it might be already fixed.

aparcar commented 8 years ago

juanriccio:

The target is lantiq/xway WBMR-A. It appears that the problem was with a faulty WAN connection, which I've been suffering a lot recently.

I guess that if the router can't get its client connected to an upstream NTP server, it will keep its own server silenced, even if the current date/time haven't drifted much yet. Is that correct? If this is the intended behavior, this bug can be closed.

I will test a more recent release just to be certain.

aparcar commented 8 years ago

juanriccio:

Issue confirmed with Lede r2065. Rather than disabling the wan, I edited /etc/config/system to add a couple bogus ntp servers: config timeserver 'ntp' option enable_server '1' list server 'fake1.time.invalid' list server 'fake1.time.invalid' and restarted the daemon: /etc/init.d/sysntpd restart After this, the clients got no response from the Lede ntp server. The server started working again as soon as I restored the proper servers in /etc/config/system and restarted sysntpd again.

Is this behavior intended? It seems a bit too radical to silence the router's ntp server when external ntp service isn't available. If a client tries to sync time while there is a wan outage, it will get no answer from the router - and maybe reset itself to internal timekeeping/manual time setup. This actually happened to me, and by Murphy's law it happened while we were switching out of daylight savings time, resulting in a huge 1-hour+ drift.

aparcar commented 8 years ago

mkresin:

To my read of [[https://bugs.busybox.net/show_bug.cgi?id=8131|upstream bug report 8131]] it is the expected behaviour.

It seams to me the issue you are seeing is not LEDE specific but rather something that should be addressed upstream.

aparcar commented 8 years ago

juanriccio:

I see. Thanks for digging that up, @Mathias Kresin!

It's not a matter with the LEDE developers, but I do wonder how I'm supposed to "babysit my services"...

EDIT - On second thought, my situation might be slightly different, because the page you linked says the service exits, while as I wrote in my original post here above, in my case the process was reported by ps as still active - same in subsequent tests with a more current build of LEDE.

aparcar commented 8 years ago

jch:

Could this, by any chance, be a duplicate of

https://dev.openwrt.org/ticket/18404

?

I've tried but failed to debug that -- it's probably some weird interaction between the ntp daemon and the source-specific routing support in OpenWRT. I've given up and switched to the ntpd package.

aparcar commented 8 years ago

mkresin:

Could this, by any chance, be a duplicate of https://dev.openwrt.org/ticket/18404

I don't think so. stintel is a LEDE developer and I'm quite sure his busybox patch is already backported to LEDE (or busybox updated to version which includes the fix).

But I'm sure the issue you are seeing is the expected behaviour. It makes sense to me to not reply to client requests as long as ntpd can't be sure that the current (machine) time is valid.

I haven't closed this ticket yet, since I did not had the time to check the busybox source code.

Would one of you so kind to ask on the [[https://busybox.net/lists.html|Busybox mailing list]] if it's a bug (perhaps a LEDE one) or if it is the expected behaviour.

aparcar commented 8 years ago

jch:

Ah, sorry, I misread the bug report.

My reading of RFC 5905 is that unsynchronised peers should be sending packets with the stratum field set to 0 and the LI field set to 3:

Initially, all variables are cleared to zero, then the leap is set to
3 (unsynchronized) and stratum is set to MAXSTRAT (16).  Remember
that MAXSTRAT is mapped to zero in the transmitted packet.

I've just checked that this is the way both ntpd and chrony behave:

17:12:10.032320 IP (tos 0xb8, ttl 64, id 14778, offset 0, flags [DF], proto UDP (17), length 76)
    192.168.1.1.123 > 192.168.3.117.46945: NTPv4, length 48
        Server, Leap indicator: clock unsynchronized (192), Stratum 0 (unspecified), poll 3 (8s), precision -18
        Root Delay: 0.000000, Root dispersion: 0.000122, Reference-ID: (unspec)
          Reference Timestamp:  0.000000000
          Originator Timestamp: 3687610330.028420243 (2016/11/08 17:12:10)
          Receive Timestamp:    3687610309.385550141 (2016/11/08 17:11:49)
          Transmit Timestamp:   3687610309.385816186 (2016/11/08 17:11:49)
            Originator - Receive Timestamp:  -20.642870068
            Originator - Transmit Timestamp: -20.642604053

Now, since sysntp doesn't support symmetric mode (peer-to-peer operation), it should not break anything to ignore client requests when unsynchronised. Still, I find the standard behaviour occasionally useful, so I guess it could usefully be added to sysntp.

aparcar commented 8 years ago

jow-:

The behaviour matches what upstream intented, any further feature requests should be directed at busybox.net

aparcar / openwrt

FS#260 - ntp server doesn't work - no reply to clients #321