devlaam / packet_forwarder

A LoRa packet forwarder is a program running on the host of a LoRa gateway that forwards RF packets receive by the concentrator to a server through a IP/UDP link, and emits RF packets that are sent by the server.
Other
1 stars 0 forks source link

Should periodically retry/redo DNS lookups #1

Closed matthijskooijman closed 7 years ago

matthijskooijman commented 7 years ago

When the upstream servers are configured using hostnames, the poly forwarder seems to do a single DNS lookup at startup and use the resulting IP address indefinitely. This poses two problems:

It seems that both problems can be fixed by periodically re-resolving these DNS names. I was thinking that:

Also, if a lookup fails, but an IP address is known from a previous lookup, it should continue to be used.

devlaam commented 7 years ago

Please note that for every server you can define a maximum stall time (in seconds) in the global_json.conf, if the value is zero it is ignored (per default). If different from zero, the forwarder exits if the server does not respond within that time. Usually that means that it is respawn by the service that started it. This behaviour implicitly gives rise to a new DNS lookup; the downside being is short interruption of the data stream for all servers. By implementing it that way i could stay relatively close to the Semtech code.

matthijskooijman commented 7 years ago

I just tried the stall time option, and it seems to detect a stall when no PUSH_DATA acks were received, even when no PUSH_DATA packets were sent at all (because there are no incoming RF packets). Is this expected and intended?

Here's a bit of log output showing what happens:

Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ##### 2016-12-28 14:41:03 GMT #####
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### [UPSTREAM] ###
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # RF packets received by concentrator: 1
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # CRC_OK: 0.00%, CRC_FAIL: 100.00%, NO_CRC: 0.00%
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # RF packets forwarded: 0 (0 bytes)
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # PUSH_DATA datagrams sent: 0 (0 bytes)
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # PUSH_DATA acknowledged: 0.00%
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### [DOWNSTREAM] ###
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # PULL_DATA sent: 3 (100.00% acknowledged)
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # PULL_RESP(onse) datagrams received: 0 (0 bytes)
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # RF packets sent to concentrator: 0 (0 bytes)
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # TX errors: 0
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### BEACON IS DISABLED!
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### [JIT] ###
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # INFO: JIT queue contains 0 packets.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # INFO: JIT queue contains 0 beacons.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### [GPS] ###
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # No time keeping possible due to fake gps.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Manual GPS coordinates: latitude 0.00000, longitude 0.00000, altitude 0 m
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### [PERFORMANCE] ###
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Upstream radio packet quality: 0.00%.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Upstream datagram acknowledgment quality for server "router.eu.thethings.network" is 0.00%.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Downstream heart beat acknowledgment quality for server "router.eu.thethings.network" is 100.00%.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Downstream datagram content quality for server "router.eu.thethings.network" is 0.00%.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Downstream radio transmission quality for server "router.eu.thethings.network" is 100.00%.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Ideetron concise status report send.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ##### END #####
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: INFO: [down] for server router.eu.thethings.network PULL_ACK received in 424 ms
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: INFO: [down] for server router.eu.thethings.network PULL_ACK received in 104 ms
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: INFO: [down] for server router.eu.thethings.network PULL_ACK received in 113 ms
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: INFO: [up] received packet with bad CRC from mote: 4719D601 (fcnt=58483)
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: INFO: Disabling GPS mode for concentrator's counter...
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: INFO: host/sx1301 time offset=(1482936031s:464329µs) - drift=3771µs
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: INFO: Enabling GPS mode for concentrator's counter.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ##### 2016-12-28 14:41:33 GMT #####
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### [UPSTREAM] ###
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # RF packets received by concentrator: 1
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # CRC_OK: 0.00%, CRC_FAIL: 100.00%, NO_CRC: 0.00%
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # RF packets forwarded: 0 (0 bytes)
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # PUSH_DATA datagrams sent: 1 (281 bytes)
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # PUSH_DATA acknowledged: 0.00%
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### [DOWNSTREAM] ###
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # PULL_DATA sent: 3 (100.00% acknowledged)
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # PULL_RESP(onse) datagrams received: 0 (0 bytes)
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # RF packets sent to concentrator: 0 (0 bytes)
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # TX errors: 0
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### BEACON IS DISABLED!
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### [JIT] ###
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # INFO: JIT queue contains 0 packets.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # INFO: JIT queue contains 0 beacons.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### [GPS] ###
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # No time keeping possible due to fake gps.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Manual GPS coordinates: latitude 0.00000, longitude 0.00000, altitude 0 m
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ### [PERFORMANCE] ###
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Upstream radio packet quality: 0.00%.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Upstream datagram acknowledgment quality for server "router.eu.thethings.network" is 0.00%.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Downstream heart beat acknowledgment quality for server "router.eu.thethings.network" is 100.00%.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Downstream datagram content quality for server "router.eu.thethings.network" is 0.00%.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Downstream radio transmission quality for server "router.eu.thethings.network" is 100.00%.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: # Ideetron concise status report send.
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ##### END #####
Dec 28 14:41:34 lorank-4 lorank8v1[2562]: ERROR: [main] for server router.eu.thethings.network stalled for 64 seconds, terminating packet forwarder.
matthijskooijman commented 7 years ago

Never mind my last comment - it seems PUSH_DATA is actually sent every now and then, but it seems the PUSH_ACKS are somehow failing to be received after changing between eth and 3G networks (not sure why the PULL_ACKS are correctly received, though).

matthijskooijman commented 7 years ago

Seems that the PUSH_ACK failures when PULL_ACK was working, were due to 3G latency. Increasing push_timeout_ms to 1000ms fixes that.

Another problem is that when changing connections (or reconnecting 3G), the IP address changes, and the forwarder ends up with its incoming UDP socket listening on the wrong address, so ACKs no longer come in. In this case, a restart due to exceeded stall time is actually ok, so no problem there.

devlaam commented 7 years ago

Okay, so the conclusion of your posts is that the stall option "solves" the problem?

matthijskooijman commented 7 years ago

I haven't actually tested that, but I expect so. I'll close this ticket, I'll reopen it if I notice the problem again with the stall option.