Open bradanlane opened 6 years ago
I forgot to add ...
The jump appears to match up with the underlying forced update.
When I review the extended log, I see a slight variation in the epoch when the underlying force update gets a new value from the NTP service. These variances are observable at 60 minute intervals which corresponds to the interval parameter when the NTPClient object was created. This leads me to believe the time jump is somehow related to the code related to the forever update.
If there is diagnostic debug data available from NTPClient, I'm happy to leverage that.
and its not "day light saving" stuff? sorry if you already checked that TL;DR
Thanks for the reply. Yes, I am using the Timezone library and one of the first steps I did was to remove that code.
My issue may be specific to the ESP8266 or the more specific Wemos D1 Mini.
The time jump does appear to the specified interval set. With the interval set to 1 hour, the jumps were approx an hour.
I just set the interval to 15 minutes and after a short while, NTPClient::forceUpdate() jumped by something close to 15 minutes.
begin new logging session
NTPClient::forceUpdate() 1534117800
NTPClient::forceUpdate() 1534118701
NTPClient::forceUpdate() 1534119661
NTPClient::forceUpdate() failed
Clock update failed: 00:37
NTPClient::forceUpdate() failed
Clock update failed: 00:38
NTPClient::forceUpdate() 1534120682
NTPClient::forceUpdate() 1534120747
Clock jumped: from 00:53 to 00:39
NTPClient::forceUpdate() 1534121712
NTPClient::forceUpdate() 1534122672
The error occurs in the forceUpdate()
- but only after a forceUpdate()
has failed because of a timeout.
FYI: In this example, the interval
was set to 15 minutes. My code only calls NTPClient::update()
once every minute. Thus, it is possible for NTPClient::forceUpdate()
to occur after inetrval + 1 minute
eg 16 minutes.
WiFiUDP ntpUDP;
NTPClient timeClient(ntpUDP, "time.nist.gov", 0, 900000);
The log shows the forceUpdate()
was successful at 16:09, then failed on the next attempt at 16:25, and then succeed on the next attempt 1 minute later at 16:26. However, the next update was successful but off by the something related to the interval
. This is consistent with the above test results.
The log now shows all four timestamps:
NTPClient::forceUpdate() 15:20:02 06:28:16 15:21:01 15:21:01
NTPClient::forceUpdate() 15:36:33 06:28:16 15:37:01 15:37:01
NTPClient::forceUpdate() 15:51:42 06:28:16 15:53:01 15:53:01
NTPClient::forceUpdate() 16:09:00 06:28:16 16:09:01 16:09:01
NTPClient::forceUpdate() timeout
Clock update failed: 16:25
NTPClient::forceUpdate() 16:24:09 06:28:16 16:25:07 16:25:07
NTPClient::forceUpdate() 16:24:35 06:28:16 16:26:04 16:26:04
Clock jumped: from 16:40 to 16:26
NTPClient::forceUpdate() 16:40:36 06:28:16 16:41:57 16:41:57
NTPClient::forceUpdate() 16:56:34 06:28:16 16:57:54 16:57:54
I changed my code so it calls NTPClient::update() as often as it can rather than only calling it once per minute. "Most of the time" it appears to recover from timeouts without causing the time jump problem.
begin new logging session
REFERENCE ORIGIN RECEIVE TRANSMIT
NTPClient::forceUpdate() 19:32:13 06:28:16 19:32:29 19:32:29
NTPClient::forceUpdate() timeout
NTPClient::forceUpdate() 19:47:22 06:28:16 19:47:31 19:47:31
NTPClient::forceUpdate() 20:00:55 06:28:16 20:02:31 20:02:31
NTPClient::forceUpdate() 20:16:21 06:28:16 20:17:31 20:17:31
NTPClient::forceUpdate() 20:31:23 06:28:16 20:32:31 20:32:31
NTPClient::forceUpdate() 20:46:38 06:28:16 20:47:31 20:47:31
NTPClient::forceUpdate() 21:02:20 06:28:16 21:02:31 21:02:31
NTPClient::forceUpdate() timeout
NTPClient::forceUpdate() 21:17:29 06:28:16 21:17:33 21:17:33
NTPClient::forceUpdate() 21:32:01 06:28:16 21:32:33 21:32:33
NTPClient::forceUpdate() 21:46:11 06:28:16 21:47:33 21:47:33
NTPClient::forceUpdate() 22:02:02 06:28:16 22:02:33 22:02:33
NTPClient::forceUpdate() 22:16:19 06:28:16 22:17:33 22:17:33
NTPClient::forceUpdate() 22:30:59 06:28:16 22:32:33 22:32:33
NTPClient::forceUpdate() timeout
NTPClient::forceUpdate() 22:46:16 06:28:16 22:47:36 22:47:36
NTPClient::forceUpdate() 22:46:16 06:28:16 22:47:37 22:47:37
clockLoop(): clock jumped from local 19:02 to 18:47
NTPClient::forceUpdate() 23:01:14 06:28:16 23:02:37 23:02:37
NTPClient::forceUpdate() 23:16:26 06:28:16 23:17:38 23:17:38
Just been experiencing the same symptoms and applied the changes seen in this pull request.
Now the time drift isn't observed anymore after a failed update (timeout) followed by a successful update.
Thanks. I've grabbed the code from the pull request and am starting my testing.
Update: still good after 34 hours - even with several timeouts. The calls to foreUptate() have drifted about 90 seconds over that same period.
I'll have to dig into the pull request details. I am still seeing the possibility of the time jump error:
NTPClient::forceUpdate() 06:02:09
NTPClient::forceUpdate() 06:17:09
NTPClient::forceUpdate() 06:32:09
NTPClient::forceUpdate() 06:47:09
NTPClient::forceUpdate() 07:02:10
NTPClient::forceUpdate() timeout
Clock update failed: 03:17
NTPClient::forceUpdate() 07:17:25
NTPClient::forceUpdate() 07:32:25
NTPClient::forceUpdate() timeout
Clock update failed: 03:47
NTPClient::forceUpdate() 07:47:27
NTPClient::forceUpdate() 07:47:28
clockLoop(): clock jumped from local 04:02 to 03:47
NTPClient::forceUpdate() 08:02:28
NTPClient::forceUpdate() 08:17:29
NTPClient::forceUpdate() 08:32:30
NTPClient::forceUpdate() 08:47:31
NTPClient::forceUpdate() 09:02:31
NTPClient::forceUpdate() 09:17:31
NTPClient::forceUpdate() 09:32:32
NTPClient::forceUpdate() 09:47:34
NTPClient::forceUpdate() 10:02:35
NTPClient::forceUpdate() 10:17:35
NTPClient::forceUpdate() 10:32:36
NTPClient::forceUpdate() 10:47:36
NTPClient::forceUpdate() 11:02:36
NTPClient::forceUpdate() timeout
Clock update failed: 07:17
NTPClient::forceUpdate() 11:32:39
clockLoop(): clock jumped from local 07:17 to 07:32
NTPClient::forceUpdate() 11:47:39
I haven't had any for the last three days although it's querying every minute and got many timeouts.
This is the code I'm using, it's also patched to check the NTP packets for validity:
https://gist.github.com/tobozo/8bcd16391025352dcf9c8ce66f8fdae0
I included your error checking and I'm always failing on
if ((ntpPacket[0] & 0b00111000) >> 3 < 0b100) ...
When I look at the result, the value of 0b001 for the version. I'm using time.nist.gov
as my pool server.
yep this server is long outdated
Which pool server do you recommend?
It probably depends on your location, if you're in the US, you can pick yours as long as it's V4 it's supposed to be more secure.
All of the US NTP servers in the pool appear to be V3. A random test of North American servers also were not higher than V3.
I’ll use that as the minimum test.
I'm giving up this library, so many things are misleading like the latency management during UDP requests, the lack of DST support for timezone, and the too many directions it takes where it should instead converge to actual standards.
Using the AsyncUDP library could solve some of those cumulated latency problems, but after fiddling with this code, I can say that the very idea of doing some time synchronization job in an asynchronous fashion is just plain masochism :rofl:
This library is just perfect for learning, but it's not being practical for any serious project.
For example there's this functional conflict between Timelib::setSyncInterval() and timeClient.setUpdateInterval(), the misleading use of the term epoch to store localized time (The Unix epoch is the time 00:00:00 UTC on 1 January 1970 source), unappropriate recursion, and a ton of open issues pointing to the different anomalies and side effects of all this.
This example turns out to be a perfect codebase form y needs so I can add packet validation, drift calculation and dynamic interval management.
Thanks. I am giving the alternative code a try. while the NTP code is nearly identicle, the update mechanism seems to not care about timeouts and keeps consistent timing.
I have the same issue with an ESP8266. Rest of software runs fun but sometimes the time reported from the NTP lib is is half an hour (or multiples of that) to early. My update Intervall is 30 minutes so that should be the same problem. That occurs very randomly, sometimes it is running smoothly for several weeks, than I have that problem only several hours after a reboot.
Since no one is working here anymore I build my own functions based on the ESP WiFi NTP example which is working fine, I need only the minute of day. But I may discovered the reason for the problem:
What happens if the response of the NTP-Server needs more than a second?
The function NTPClient::forceUpdate() escapes in line 76 and won't handle any late responses. But the late UDP response packet will still be received by the UDP layer.
An half hour later the next NTP sync should happen and NTPClient::forceUpdate() is called again. It doesn't clear old UDP packages and just request a new one from the time server. In line 75 it receives immediately the old, timed out answer from the request a half hour ago, because it is still in the buffer of the UDP Layer. So the the timestamp from the old request is handled as the current one and the time shifts almost exactly the sync interval time to the past.
So we should at least delete received UDP packets at the beginning of NTPClient::forceUpdate() before requesting the new NTP packet, to make "sure" that the received packet is really the answer of our latest request. I don't know much of NTP and it might be better to implement a check based of the information of the UDP packet. Maybe there an ID to identify the packer or so.
Another problem might be a UDP buffer overflow when to many UDP packets are timed out...
Best Bimbo385
One thing that needs consideration is the TOS for the time pool, updates under 30 minutes are frowned upon. This may be contributing to the problem as the time server is seeing the 1 minute polling as too often and is throttling the responses. One of these is the KOD and it returns packets with the same timestamp as the send, which may be triggering the different in the time taking into account local offset etc.
I encountered a similar problem in my project, I have a refresh rate of 10 minutes, I have jumps multiples of 10 minutes, and the discrepancy may increase with time, it was 30 minutes. I have esp-01 I interesting what will happen if I set a day
I encountered exactly the same problem - update interval set to 1h and for some hours the lib keeps running the right clock and then suddenly...randomly it jumps back 1h. Unacceptable behavior. It makes no sense to use this lib.
Platform - ESP8266 (WEMOS D1 mini)
I am using the NTPClient on a Wemos D1 Mini (ESP8266). On occasion, my code is detecting that the return from NTPClient.update() then NTPClient.getEpochTime() has jumped an hour. I have tested with different NTP server pools (time.nist.gov and pool.ntp.org). The occurrence of the jump is not consistent. I've recorded it in teh morning, event, and over night.
I'm looking for debugging suggestions.
I have the following code fragment (with some gorpy calculations removed):
Here is a snippet of my logging: