Lora-net / packet_forwarder

A LoRa packet forwarder is a program running on the host of a LoRa gateway that forwards RF packets receive by the concentrator to a server through a IP/UDP link, and emits RF packets that are sent by the server. This project is associated to the lora_gateway repository for SX1301 chip. For SX1302/1303, the repository sx1302_hal must be used.
Other
724 stars 668 forks source link

Packet forwarder current clock for JIT appears to move backward #90

Closed reissjason closed 6 years ago

reissjason commented 6 years ago

We are having a clock issue with the packet forwarder. The clock used for checking at the JIT queue appears to be non-sequential, but the clock attached to uplinks appears to be OK. This causes all downlinks to be rejected as TOO_EARLY/TOO_LATE Also the calculated drift appears to be abnormal as well.

Uplinks received 5 minutes apart have correct tmst values. [2018-06-06T13:19:14.916842Z] JSON up: {"rxpk":[{"tmst":1654528684, ... [2018-06-06T13:24:14.889287Z] JSON up: {"rxpk":[{"tmst":1954511060, ...

5 minutes = 300 seconds = 300 000 000 us

1954511060 - 1654528684 = 299 982 376 us

Downlinks are scheduled with the correct offsets 1s in time to be transmitted. Packet forwarder current clock appears to move backward. [2018-06-06T13:19:15.779653Z] ERROR: Packet REJECTED, timestamp seems wrong, too much in advance (current=2129901833, packet=1655528684, type=0) [2018-06-06T13:24:15.519518Z] ERROR: Packet REJECTED, timestamp seems wrong, too much in advance (current=2129630368, packet=1955511060, type=0)

5 minutes = 300 seconds = 300 000 000 us

1955511060 - 1655528684 = 299 982 376 us 2129630368 - 2129901833 = -271 465 us

INFO: Disabling GPS mode for concentrator's counter... INFO: host/sx1301 time offset=(1528380288s:766454µs) - drift=60000895µs INFO: Enabling GPS mode for concentrator's counter. INFO: Disabling GPS mode for concentrator's counter... INFO: host/sx1301 time offset=(1528380348s:767414µs) - drift=60000960µs INFO: Enabling GPS mode for concentrator's counter.

Any insight to how this could be possible or where to look next would be appreciated.

reissjason commented 6 years ago

GPS is not enabled. We get the same result with and without NTP updating the system clock.

mcoracin commented 6 years ago

Indeed, the calculated drift is abnormal. Except the first one displayed which is to be ignored, the typical value is few microseconds. So either your system time is wrong, or there is an issue with the value you get from sx1301 counter. The drift you see is 60s which corresponds to the pace of the thread_timersync thread. So I would suspect your system clock not to be updated regularly. Could you just enable the following logs in thread_timersync():

        MSG_DEBUG(DEBUG_TIMERSYNC, "  sx1301    = %u (µs) - timeval (%ld,%ld)\n",
            sx1301_timecount,
            concentrator_timeval.tv_sec,
            concentrator_timeval.tv_usec);
        MSG_DEBUG(DEBUG_TIMERSYNC, "  unix_timeval = %ld,%ld\n", unix_timeval.tv_sec, unix_timeval.tv_usec);
reissjason commented 6 years ago

The following is a log with timersync debug enabled. The rx packet tmst value is somehow not inline with the sx1301 clock being tracked by the packet forwarder.

JSON up: {"rxpk":[{"tmst":1248286708,"chan":4,"rfch":1,"freq":924.000000,"stat":1,"modu":"LORA","datr":"SF12BW125","codr":"4/5","lsnr":9.0,"rssi":-81,"size":24,"data":"gNEP8EaAkTcBkfNTgygfRewb9Bqwxa7/"}]} src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528805182,889914 src/timersync.c:80:get_concentrator_time(): offset is 1528803006,624771 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2176,265143 src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528805182,905403 src/timersync.c:80:get_concentrator_time(): offset is 1528803006,624771 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2176,280632 src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528805182,915837 src/timersync.c:80:get_concentrator_time(): offset is 1528803006,624771 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2176,291066 ... src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528805183,372800 src/timersync.c:80:get_concentrator_time(): offset is 1528803006,624771 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2176,748029 src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528805183,386433 src/timersync.c:80:get_concentrator_time(): offset is 1528803006,624771 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2176,761662 INFO: [down] PULL_RESP received - token[2:0] :)

JSON down: {"txpk": {"tmst": 1249286708, "prea": 8, "imme": false, "freq": 924, "rfch": 0, "powe": 22, "modu": "LORA", "datr": "SF12BW125", "codr": "4/5", "ipol": true, "ncrc": true, "size": 17, "data": "YNEP8EYlIDIDIP8AAZjy7Wc="}} src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528805183,387549 src/timersync.c:80:get_concentrator_time(): offset is 1528803006,624771 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2176,762778 src/jitqueue.c:251:jit_enqueue(): ERROR: Packet REJECTED, timestamp seems wrong, too much in advance (current=2176762778, packet=1249286708, type=0) ERROR: Packet REJECTED (jit error=2) src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528805183,396929 src/timersync.c:80:get_concentrator_time(): offset is 1528803006,624771 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2176,772158 src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528805183,407460 src/timersync.c:80:get_concentrator_time(): offset is 1528803006,624771 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2176,782689

reissjason commented 6 years ago

Here we can see the clock progress through a rollover. Eventually a large offset is calculated and thereafter the 60 second drift is seen.

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528803388,781444 src/timersync.c:80:get_concentrator_time(): offset is 1528799639,540042 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 3749,241402

INFO: Disabling GPS mode for concentrator's counter... src/timersync.c:128:thread_timersync(): sx1301 = 3749243635 (µs) - timeval (3749,243635) src/timersync.c:129:thread_timersync(): unix_timeval = 1528803388,783561 INFO: host/sx1301 time offset=(1528799639s:539926µs) - drift=-116µs INFO: Enabling GPS mode for concentrator's counter.

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528803388,791877 src/timersync.c:80:get_concentrator_time(): offset is 1528799639,539926 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 3749,251951

...

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528803928,806939 src/timersync.c:80:get_concentrator_time(): offset is 1528799639,539010 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 4289,267929

INFO: Disabling GPS mode for concentrator's counter... src/timersync.c:128:thread_timersync(): sx1301 = 4289270013 (µs) - timeval (4289,270013) src/timersync.c:129:thread_timersync(): unix_timeval = 1528803928,808909 INFO: host/sx1301 time offset=(1528799639s:538896µs) - drift=-114µs INFO: Enabling GPS mode for concentrator's counter.

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528803928,817343 src/timersync.c:80:get_concentrator_time(): offset is 1528799639,538896 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 4289,278447

...

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528803988,803148 src/timersync.c:80:get_concentrator_time(): offset is 1528799639,538896 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 4349,264252

INFO: Disabling GPS mode for concentrator's counter... src/timersync.c:128:thread_timersync(): sx1301 = 54303946 (µs) - timeval (54,303946) src/timersync.c:129:thread_timersync(): unix_timeval = 1528803988,810023 INFO: host/sx1301 time offset=(1528803934s:506077µs) - drift=-115µs INFO: Enabling GPS mode for concentrator's counter.

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528803988,813554 src/timersync.c:80:get_concentrator_time(): offset is 1528803934,506077 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 54,307477

...

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528804828,816060 src/timersync.c:80:get_concentrator_time(): offset is 1528803934,504546 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 894,311514 src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528804828,826474 src/timersync.c:80:get_concentrator_time(): offset is 1528803934,504546 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 894,321928

INFO: Disabling GPS mode for concentrator's counter... src/timersync.c:128:thread_timersync(): sx1301 = 894313769 (µs) - timeval (894,313769) src/timersync.c:129:thread_timersync(): unix_timeval = 1528804828,828104 INFO: host/sx1301 time offset=(1528803934s:514335µs) - drift=9789µs INFO: Enabling GPS mode for concentrator's counter.

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528804828,836926 src/timersync.c:80:get_concentrator_time(): offset is 1528803934,514335 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 894,322591 src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528804828,848963 src/timersync.c:80:get_concentrator_time(): offset is 1528803934,514335 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 894,334628

...

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528804888,817714 src/timersync.c:80:get_concentrator_time(): offset is 1528803934,514335 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 954,303379 src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528804888,829099 src/timersync.c:80:get_concentrator_time(): offset is 1528803934,514335 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 954,314764

INFO: Disabling GPS mode for concentrator's counter... src/timersync.c:128:thread_timersync(): sx1301 = 2122219134 (µs) - timeval (2122,219134) src/timersync.c:129:thread_timersync(): unix_timeval = 1528804888,839650 INFO: host/sx1301 time offset=(1528802766s:620516µs) - drift=-1167893819µs INFO: Enabling GPS mode for concentrator's counter.

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528804888,840383 src/timersync.c:80:get_concentrator_time(): offset is 1528802766,620516 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2122,219867 src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528804888,850779 src/timersync.c:80:get_concentrator_time(): offset is 1528802766,620516 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2122,230263

...

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528806148,879177 src/timersync.c:80:get_concentrator_time(): offset is 1528803966,664679 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2182,214498

INFO: Disabling GPS mode for concentrator's counter... src/timersync.c:128:thread_timersync(): sx1301 = 2122219134 (µs) - timeval (2122,219134) src/timersync.c:129:thread_timersync(): unix_timeval = 1528806148,884927 INFO: host/sx1301 time offset=(1528804026s:665793µs) - drift=60001114µs INFO: Enabling GPS mode for concentrator's counter.

src/timersync.c:79:get_concentrator_time(): --> TIME: unix current time is 1528806148,889623 src/timersync.c:80:get_concentrator_time(): offset is 1528804026,665793 src/timersync.c:81:get_concentrator_time(): sx1301 current time is 2122,223830

reissjason commented 6 years ago

lgw_get_trigcnt returns 2122219134 every time until restart.

src/timersync.c:128:thread_timersync(): sx1301 = 2122219134 (µs) - timeval (2122,219134)

Perhaps the register read is failing? thread_timersync does not check the return value of lgw_get_trigcnt.

int lgw_get_trigcnt(uint32_t* trig_cnt_us) {
    int i;
    int32_t val;

    i = lgw_reg_r(LGW_TIMESTAMP, &val);
    if (i == LGW_REG_SUCCESS) {
        *trig_cnt_us = (uint32_t)val;
        return LGW_HAL_SUCCESS;
    } else {
        return LGW_HAL_ERROR;
    }
}
reissjason commented 6 years ago

Register read is not failing. Status update shows the same trigcnt being read on a different thread.

### [JIT] ###
# SX1301 time (PPS): 2122219134
reissjason commented 6 years ago

We have been able to recreate this state at times by running the spectral scan utility. With v31 FPGA code installed.

### [JIT] ###
# SX1301 time (PPS): 2122219134

Sometimes the lgw_get_trigcnt is stuck until restart, other times it recovers

Sometimes Rx errors are seen during spectral scan.

Error 1

lgw_receive:1155: FIFO content: 1 d9 3 5 12
lgw_receive:1170: WARNING: 208 NOT A VALID IF_CHAIN NUMBER, ABORTING

Error 2

Note: LoRa packet
WARNING: invalid packet, no timestamp correction

INFO: Received pkt from mote: C4751E61 (fcnt=48401)
ERROR: [up] lora packet with unknown datarate
reissjason commented 6 years ago

Interestingly 2122219134 = 0x7e7e7e7e

reissjason commented 6 years ago

With v31 FPGA code the spectral scan will perform a soft reset of the FPGA

x = lgw_connect(false, LGW_DEFAULT_NOTCH_FREQ); /* FPGA reset/configure */

In lgw_connect (loragw_reg.c:504) there is a write to reset the page register during the above reset.

/* write 0 to the page/reset register */
spi_stat = lgw_spi_w(lgw_spi_target, lgw_spi_mux_mode, LGW_SPI_MUX_TARGET_SX1301, loregs[LGW_PAGE_REG].addr, 0);

After this write, the clock from lgw_get_trigcnt is always 0x7e7e7e7e.

Is this soft reset needed in spectral scan? Can true be provided as the first parameter to skip the reset as the NOTCH FREQ has already been configured by the packet forwarder? When false is provided to lgw_connect the lgw_get_trigcnt is not affected in the packet forwarder code.

reissjason commented 6 years ago

In libloragw there is a software variable holding the current page. If the spectral scan resets the page to 0 the library in the packet forwarder will not be updated. The register reads will then be wrongly addressed after spectral scan is started if the packet forwarder was not on page 0 when the spectral scan is started.