cesanta / mDash

Arduino / ESP-IDF library for mdash.net IoT service
https://mdash.net
Other
34 stars 17 forks source link

version 1.2.15 setting the device clock to wildly incorrect values #18

Closed jbrzusto closed 1 year ago

jbrzusto commented 1 year ago

The function sntpcb (apparently - source code is not available) misinterprets the return value of mg_sntp_parse, This behaviour appears on both esp32 (see this mdash forum post) and linux-x64. Debugging on linux-x64 shows that sntpcb is discarding the upper 32-bit portion of the mg_sntp_parse return value (which is uint64_t count of milliseconds since the epoch) and using the lower 32 bits as the count of seconds since the epoch, leading to nonsensical system time.

Below are the ugly linux-x64 debug details, but the behaviour is identical on esp32. The test app which I debugged:

#include <stdio.h>
#include <unistd.h>
#include <time.h>
#include "mDash.h"

int main (int argc, char *argv[]) {
#define DEVICE_PASSWORD "..."
  mDashBegin(DEVICE_PASSWORD);
  int i;
  for (i = 0;;++i) {
    usleep(1000000); // sleep 1 second
    time(&t);
    printf("Slept %d and time is now %ld\n", i, t);
  }
}

State of relevant x64 registers upon leaving mg_sntp_parse:

Thread 2 "test" hit Breakpoint 2, 0x0000555555571551 in mg_sntp_parse (buf=0x7ffff00015a0 "$\001", len=48) at x2.c:6316
6316    x2.c: No such file or directory.
(gdb) info registers 
rax            0x181f6c2237f       1657702327167
rbx            0x0                 0
rcx            0xa7                167
rdx            0xdc82              56450
rsi            0x181f6c222d8       1657702327000

The correct (except for https://github.com/cesanta/mongoose/issues/1628) milliseconds since the epoch is in rax. But just before the call to mg_log inside of sntpcb, here's what the registers hold:

Thread 2 "test" hit Breakpoint 1, 0x000055555557aac6 in sntpcb (c=0x7ffff0000b60, ev=17, evd=0x7ffff7d90d00, fnd=0x0) at x2.c:10986
10986   in x2.c
(gdb) info registers 
rax            0x0                 0
rbx            0x0                 0
rcx            0x0                 0
rdx            0x9d7f4000          2642362368
rsi            0xf6c2237f          4139918207

Notice that rsi holds the lower 32-bits (0xf6c2237f) of the return value from mg_sntp_parse (0x181f6c2237f), and the upper 32-bits has apparently disappeared. Letting the code continue, we get this message:

SNTP: setting time to 4139918207.2642362368

which has clearly used the incorrect values from registers rsi and rdx.

This bug is a show-stopper. Although the initial connection to the mDash websocket server appears to set the device clock correctly, sntpcb will eventually clobber it.

ssbrewco commented 1 year ago

Thanks for capturing these details. Looking at these symptoms it seems like this may be related to the NTP sync issues described in issue #14 that were introduced in mDash 1.2.14 and very likely persist in 1.2.15

Would you be willing to repeat your tests using 1.2.13 to see if they go away? If so, this issue is likely a duplicate of #14

jbrzusto commented 1 year ago

I tried this with 1.2.13, adding a print of the system time because no log message is generated in this version, and you're right, the system clock is set correctly.

ssbrewco commented 1 year ago

@jbrzusto - Thanks for verifying! This is very useful. As noted in issue #14, it appears this bug was introduced in 1.2.14

@cpq - Can you please take a look at this and advise?

jbrzusto commented 1 year ago

I'm hoping to eventually be able to submit a pull request on this issue (pending my company signing an NDA with Cesanta) but meantime, here is workaround for version 1.2.15, assuming you have a debug version of the library file, which is what has been released to date. Tested on ESP32. I can't offer support, but consider the file below released under whatever licence(s) is compatible with mongoose + mDash + your needs.

  1. modify the binary libmDash.a like so (using the objcopy appropriate to your target platform; e.g. xtensa-esp32-elf-objcopy for esp32):

    objcopy --globalize-symbol mg_sntp_connect --globalize-symbol s_time_set --globalize-symbol s_sntp libmDash.a
    objcopy --weaken-symbol mg_sntp_connect libmDash.a

    This allows us to replace the built-in mg_sntp_connect function (and its callees) with one that calls our corrected versions of those callees (namely mg_sntp_parse and sntpcb). The two variables s_time_set and s_sntp also need to be made global so our replacement code can use them.

  2. copy the file mongoose.h from the mongoose repo into your firmware source folder.

  3. add the following new file, called sntp.c to your firmware source folder, and add it to your build instructions, if needed. The functions my_mg_sntp_parse and mysntpcb replace the buggy mg_sntp_parse and sntpcb in the mDash library. The functions mg_sntp_connect and mysntp_cb replace the ones in the library, due to the symbol weakening done above. These versions differ only in calling the my versions of the functions, instead of the buggy ones in the library.

#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include <sys/time.h>
#include "mongoose.h"
#define SNTP_INTERVAL_SEC 3600
#define SNTP_TIME_OFFSET 2208988800UL

extern uint32_t mg_ntohl(uint32_t net);
extern int s_time_set;
extern int s_sntp;

int64_t my_mg_sntp_parse(const unsigned char *buf, size_t len) {
  int64_t res = -1;
  int mode = len > 0 ? buf[0] & 7 : 0;
  int version = len > 0 ? (buf[0] >> 3) & 7 : 0;
  if (len < 48) {
    MG_ERROR(("%s", "corrupt packet"));
  } else if (mode != 4 && mode != 5) {
    MG_ERROR(("%s", "not a server reply"));
  } else if (buf[1] == 0) {
    MG_ERROR(("%s", "server sent a kiss of death"));
  } else if (version == 4 || version == 3) {
    uint32_t *data = (uint32_t *) &buf[40];
    unsigned long seconds = mg_ntohl(data[0]) - SNTP_TIME_OFFSET;
    unsigned long fracseconds = mg_ntohl(data[1]);
    res = ((int64_t) seconds) * 1000 + (((int64_t) (fracseconds) * 1000) >> 32);
  } else {
    MG_ERROR(("unexpected version: %d", version));
  }
  return res;
}

void mysntp_cb(struct mg_connection *c, int ev, void *evd, void *fnd) {
  if (ev == MG_EV_READ) {
    int64_t milliseconds = my_mg_sntp_parse(c->recv.buf, c->recv.len);
    if (milliseconds > 0) {
      mg_call(c, MG_EV_SNTP_TIME, (uint64_t *) &milliseconds);
           printf("%u.%u", (unsigned) (milliseconds / 1000),
                       (unsigned) (milliseconds % 1000));
    }
    mg_iobuf_del(&c->recv, 0, c->recv.len);  // Free receive buffer
  } else if (ev == MG_EV_CONNECT) {
    mg_sntp_request(c);
  } else if (ev == MG_EV_CLOSE) {
  }
  (void) fnd;
  (void) evd;
}

// reverse-engineered from object code and then corrected:
void mysntpcb(struct mg_connection *c, int ev, void *ev_data, void *fn_data ) {
  switch (ev) {
  case MG_EV_SNTP_TIME:
    {
      uint64_t ms = *(uint64_t *)ev_data;
      struct timeval tv  = {.tv_sec = ms /  1000, .tv_usec = 1000 * (ms % 1000)};
      settimeofday(&tv, 0);
      s_time_set=1;
    }
    break;
  case MG_EV_CLOSE:
    s_sntp = 0;
    break;
  }
}

struct mg_connection *mg_sntp_connect(struct mg_mgr *mgr, const char *url,
                                      mg_event_handler_t fn, void *fnd) {
  struct mg_connection *c = NULL;
  if (url == NULL) url = "udp://time.google.com:123";
  if ((c = mg_connect(mgr, url, &mysntpcb, fnd)) != NULL) c->pfn = mysntp_cb;
  return c;
}
cpq commented 1 year ago

Closing - as moved to a recent Mongoose, and open sourced the lib.

jbrzusto commented 1 year ago

Great news - thanks!