SuperHouse / esp-open-rtos

Open source FreeRTOS-based ESP8266 software framework
BSD 3-Clause "New" or "Revised" License
1.53k stars 491 forks source link

extras/timekeeping -- provide POSIX-like interface #578

Closed jeffsf closed 6 years ago

jeffsf commented 6 years ago

Provide POSIX-like timekeeping utilities using system clock to support monotonic, accurate, timezone-aware time

Implements settimeofday(), gettimeofday(), and adjtime() Utilizes setenv() and tzset()

See README.timekeeping.md for more details

Examples

Set Time of Day (UTC)

struct timeval tv;
tv->tv_sec = 1518798027;  /* 2018-02-16T16:20:27+00:00 */
tv->tv_usec = 0;
settimeofday(&tv, NULL);

Get Time of Day (UTC)

struct timeval tv;
gettimeofday(&tv, NULL);

Slew Time

struct timeval tv;
tv->tv_sec = 0;
tv->tv_usec = -50 * 1000;  /* -50 ms */
adjtime(&tv, NULL);

Set Local Time Zone to US Pacific

setenv("TZ", "PST8PDT7,M3.1.0,M11.1.0", 1);
tzset();

Set Local Time Zone to UTC

setenv("TZ", "UTC0UTC0", 1);
tzset();

Highlights

This "extra" splits out the timekeeping functions from those associated with clock discipline. It attempts to address some of the challenges with the integrated clock/SNTP implementation in extras/sntp.

It uses the system clock which should have 15 ppm or better accuracy, significantly better than the ESP8266 RTC and likely better than off-board RTCs. My Adafruit Huzzah on my desk shows around 5-6 ppm drift against NTP broadcast on my local network.

Timekeeping functionality is available without needing SNTP synchronization. If you want to call the moment your program starts ticking as "0 seconds", you can do that.

Timezone functionality is provided through the POSIX-like TZ environment variable using the standard setenv() and tzset() calls. This allows for daylight/summer time to be implemented without further updates by applications, for most locations. The internal clock is always in UTC (or whatever reference chosen) so it does not change when the timezone offset changes. ctime(), localtime(), and their ilk work with the timezone without modification.

The implementation of adjtime() means that the clock discipline can implement monotonic time -- no more backwards jumps.

The timekeeping internals are locked when called through the standard calls, improving robustness in multi-threaded applications. The lock utilizes the same TZ_LOCK as newlib implements, which calls to xSemaphoreTake() so should benefit from task-priority escalation of a blocking task in a FreeRTOS-like manner.

There are no "special" header files required; <sys/time.h>, <stdlib.h>, and <time.h> are sufficient to define the API.

The supplied routines are expected to be consistent with their POSIX counterparts, including a -1 return value on error and setting the instance-specific errno within their "reentrant" implementations.

Even when slew is underway, there are "no" cycles consumed by the timekeeping routines until gettimeofday(), settimeofday(), or adjtime() is called; there are no timers involved with adjtime(). Calculations are done with integer arithmetic to further speed performance. I say "no" in quotes, as the hourly wrap of the system clock needs to be detected. Calling any of the three aforementioned functions will accomplish that, as will simply calling gettimeofday(NULL, NULL).

The routines have been manually tested to confirm utility and correctness. The tests that I used are present in the extras/timekeeping/tests/ directory. Let me know if you would like the tests removed.

Testing With LWIP SNTP app

I've tested these routines using the "new" LWIP SNTP app and the results for broadcast NTP are very good. I see a reasonably consistent ~0.5 ms / min drift against the NTP server, with only occasional deviations up to around 1 ms. As the implementation uses adjtime() to slew the clock, time remains monotonic even when the reference time is "behind" the clock time.

The implementation of LWIP SNTP runs all but the first poll in the tcpip_thread so it is already at high priority.

I can run the example for many hours with a heap size of 192, suggesting that 256 is a "safe" size.

I'm less than thrilled with the performance of the LWIP SNTP app in polling mode, even with RTT compensation in place. As the timekeeping routines work well with the broadcast/listener mode, I suspect the problem is in the LWIP code or networking code. What I am seeing are generally consistent results, with somewhat higher deviation, than with the broadcast mode. Disturbing, however, are larger deviations, that might be clustered around multiples of 5 ms. How much of that is my mind knowing that a "tick" is 10 ms and how much is reality is an open question.

I'd like to look further into the performance of the LWIP SNTP code, including an examination of the WiFi packets, before contributing examples/sntp-lwip or the like.

jeffsf commented 6 years ago

Please hold on merging this -- not because of the code, but the parameter in one of the SNTP "tests" related to slew vs. step probably should be larger than its 128-ms value.

Last night my local, broadcast setup saw a single, 200-ms spike. Some late packet, for who knows what reason. That probably shouldn't trigger a step, especially a backwards one. At 500 us/s a single-sample spike would only cause a slew of 32 ms over the typical 64-s broadcast period. Especially for users that don't have local NTP, I think the current 128-ms threshold is too small.

jeffsf commented 6 years ago

OK, suggestions from @ourairquality incorporated, additional testing and observation over the last several days in various modes with both local and remote "pool" servers.

Good to go now, from my perspective.

The timekeeping routines are unchanged otherwise. I've increased heap on one of the tests to 288 as, at 256, it raised warnings/errors when I referenced an NTP server pool by DNS name. I've added warnings about the SNTP integration not being "production code" as the simplistic slew-or-set approach is not robust to outliers. That said, I haven't seen anything anywhere near as that 225-ms one on my local network. The step threshold is at 125 ms, consistent with the NTPv4 STEPT parameter.

If anyone wants some good bedtime reading, https://www.eecis.udel.edu/~mills/database/reports/ntp4/ntp4.pdf will quickly put you to sleep.

I'm not losing my mind, after all. It seems likely that there is some 10-ms "thing" going on in the lwIP SNTP implementation, resulting in 5-ms errors in the server-to-client delay. I don't see it in broadcast mode at all.

Poll mode, local server, RTT enabled, histogram of offset between NTP measurement and free-running local clock, linear fit removed, over 641 samples (over 11 hours). x-axis in milliseconds: image

jeffsf commented 6 years ago

https://api.travis-ci.org/v3/job/347111681/log.txt

[...]
CC /home/travis/build/SuperHouse/esp-open-rtos/extras/bearssl/BearSSL/src/int/i15_reduce.c
CC /home/travis/build/SuperHouse/esp-open-rtos/extras/bearssl/BearSSL/src/int/i15_rshift.c
CC /home/travis/build/SuperHouse/esp-open-rtos/extras/bearssl/BearSSL/src/int/i15_sub.c
CC /home/travis/build/SuperHouse/esp-open-rtos/extras/bearssl/BearSSL/src/int/i15_tmont.c

The job exceeded the maximum time limit for jobs, and has been terminated.